Skip to content

[SPARK-55533][SQL] Support IGNORE NULLS / RESPECT NULLS for collect_set#54329

Open
yaooqinn wants to merge 1 commit intoapache:masterfrom
yaooqinn:SPARK-55533
Open

[SPARK-55533][SQL] Support IGNORE NULLS / RESPECT NULLS for collect_set#54329
yaooqinn wants to merge 1 commit intoapache:masterfrom
yaooqinn:SPARK-55533

Conversation

@yaooqinn
Copy link
Member

What changes were proposed in this pull request?

This PR adds IGNORE NULLS / RESPECT NULLS support to collect_set, mirroring the existing collect_list behavior added in SPARK-55256.

  • collect_set(expr) — default, skips nulls (unchanged behavior)
  • collect_set(expr) IGNORE NULLS — explicitly skips nulls
  • collect_set(expr) RESPECT NULLS — includes null in the result set

Why are the changes needed?

For consistency: collect_list/array_agg already supports this syntax, but collect_set does not. Users who want to include null in collected sets currently have no way to do so.

Does this PR introduce any user-facing change?

Yes. collect_set now accepts IGNORE NULLS and RESPECT NULLS clauses in SQL.

How was this patch tested?

Added 3 new tests in DataFrameAggregateSuite:

  • collect_set skips nulls by default
  • collect_set with IGNORE NULLS explicitly skips nulls
  • collect_set with RESPECT NULLS preserves null in set

Was this patch authored or co-authored using generative AI tooling?

Yes, co-authored with GitHub Copilot.

Currently, collect_list/array_agg supports IGNORE NULLS and RESPECT NULLS syntax (SPARK-55256), but collect_set does not. This PR adds the same support to collect_set:

- collect_set(expr) — default, skips nulls (unchanged behavior)
- collect_set(expr) IGNORE NULLS — explicitly skips nulls
- collect_set(expr) RESPECT NULLS — includes null in the result set

Implementation mirrors the existing CollectList pattern:
- Added ignoreNulls parameter to CollectSet
- Wired CollectSet into FunctionResolution.applyIgnoreNulls
- Handled null safely in eval() for BinaryType

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant