Skip to content

[SQL] Unused field analysis for aggregates#5601

Open
mihaibudiu wants to merge 1 commit intomainfrom
issue5541
Open

[SQL] Unused field analysis for aggregates#5601
mihaibudiu wants to merge 1 commit intomainfrom
issue5541

Conversation

@mihaibudiu
Copy link
Contributor

Fixes #5541

@mihaibudiu mihaibudiu marked this pull request as draft February 11, 2026 05:25
@mihaibudiu
Copy link
Contributor Author

Interestingly, this pass actually discovers unused aggregate functions which are introduced by the Calcite decorrelator!

@mihaibudiu mihaibudiu marked this pull request as ready for review February 11, 2026 06:10
@mihaibudiu mihaibudiu marked this pull request as draft February 11, 2026 06:33
@mihaibudiu mihaibudiu marked this pull request as ready for review February 11, 2026 23:50
@mihaibudiu mihaibudiu force-pushed the issue5541 branch 2 times, most recently from 887e62d to 03616ff Compare February 11, 2026 23:56
@mihaibudiu mihaibudiu marked this pull request as draft February 12, 2026 00:01
@mihaibudiu
Copy link
Contributor Author

Calcite compiles the following query:

create materialized view q4
                as select
                        o_orderpriority,
                        count(*) as order_count
                from
                        orders
                where
                        o_orderdate >= date '1993-07-01'
                        and o_orderdate < date '1993-07-01' + interval '3' month
                        and exists (
                                select
                                        *
                                from
                                        lineitem
                                where
                                        l_orderkey = o_orderkey
                                        and l_commitdate < l_receiptdate
                        )
                group by
                        o_orderpriority

To a plan containing the following fragment:

              LogicalJoin(condition=[=($0, $9)], joinType=[inner]), id = 574
                LogicalFilter(condition=[SEARCH($4, Sarg[[1993-07-01..1993-10-01)])]), id = 552
                  LogicalTableScan(table=[[schema, orders]]), id = 68
                LogicalAggregate(group=[{0}], agg#0=[MIN($1)]), id = 559
                  LogicalProject(l_orderkey=[$0], $f0=[true]), id = 557
                    LogicalFilter(condition=[<($11, $12)]), id = 555
                      LogicalTableScan(table=[[schema, lineitem]]), id = 70

By using a MIN(true) aggregate to figure out whether the EXISTS subquery produces any rows. Turns out that the result of the MIN is never used, and this new optimization can actually completely remove it, leaving an aggregate... which does nothing. This is implemented much more efficiently by using a linear aggregate, which returns Tup0 if there is any value in the collection, or nothing otherwise.

@mihaibudiu mihaibudiu marked this pull request as ready for review February 12, 2026 00:51
Signed-off-by: Mihai Budiu <mbudiu@feldera.com>
@mihaibudiu
Copy link
Contributor Author

The analysis can also discover that the following SUM aggregate is not used, since it's used only for ORDER BY, which is ignored by default:

                select deptno
                from emp
                group by deptno
                order by sum(sal) filter (where job = 'CLERK')

@mihaibudiu
Copy link
Contributor Author

However, if an aggregate is evaluated just for side-effects (i.e., crash on overflow), removing aggregates is not sound.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[SQL] The unused fields analysis does not remove unused aggregates

1 participant