Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-50630][SQL] Fix GROUP BY ordinal support for pipe SQL AGGREGATE operators #49248

Closed

Conversation

dtenedor
Copy link
Contributor

@dtenedor dtenedor commented Dec 20, 2024

What changes were proposed in this pull request?

This PR fixes GROUP BY ordinal support for pipe SQL AGGREGATE operators.

It adds a new UnresolvedPipeAggregateOrdinal expression to represent these ordinals. In this context, the ordinal refers to the one-based position of the column in the input relation. Note that this behavior is different from GROUP BY ordinals in regular SQL, wherein the ordinal refers to the one-based position of the column in the SELECT clause instead.

For example:

select 3 as x, 4 as y, 5 as z
|> aggregate sum(y) group by 2, 3

> 4, 5, 4

select 3 as x, 4 as y, 5 as z
|> aggregate sum(y) group by 1, 2, 3

> 3, 4, 5, 4

This PR also makes a small fix for |> UNION (and other set operations) to prefer future pipe operators to apply on the result of the entire union, rather than binding to the right leg of the union only (to allay reported confusion during testing). For example, values (0, 1) s(x, y) |> union all values (2, 3) t(x, y) |> drop x will succeed rather than report an error that the number of columns does not match.

Why are the changes needed?

The current implementation has a bug where the ordinals are sometimes mistakenly retained as literal integers.

Does this PR introduce any user-facing change?

Yes, see above.

How was this patch tested?

This PR adds new golden file based test coverage.

Was this patch authored or co-authored using generative AI tooling?

No.

commit

commit

commit
@github-actions github-actions bot added the SQL label Dec 20, 2024
@dtenedor dtenedor marked this pull request as ready for review December 20, 2024 00:05
@dtenedor
Copy link
Contributor Author

cc @cloud-fan @gengliangwang this PR fixes |> AGGREGATE GROUP BY ordinal support :)

// and replace it with the corresponding attribute from the child operator.
case Literal(v: Int, IntegerType) if conf.groupByOrdinal =>
numGroupByOrdinalReferences += 1
newGroupingExpressions += UnresolvedOrdinal(numGroupByOrdinalReferences)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we have mixed group by col and group by ordinal? e.g.

a plan with output [c1, c2, c3, c4, value]
|> aggregate sum(value) group by c1, c2, c3, 4

This should be identical to group by c1, c2, c3, c4 but with the current implementation it becomes group by c1, c2, c3, c1.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!! This should be UnresolvedOrdinal(newAggregateExpressions.length + 1) instead of UnresolvedOrdinal(numGroupByOrdinalReferences). I made this update and added a test case to cover this.

Copy link
Contributor Author

@dtenedor dtenedor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @cloud-fan for your review!

// and replace it with the corresponding attribute from the child operator.
case Literal(v: Int, IntegerType) if conf.groupByOrdinal =>
numGroupByOrdinalReferences += 1
newGroupingExpressions += UnresolvedOrdinal(numGroupByOrdinalReferences)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!! This should be UnresolvedOrdinal(newAggregateExpressions.length + 1) instead of UnresolvedOrdinal(numGroupByOrdinalReferences). I made this update and added a test case to cover this.

@dtenedor dtenedor requested a review from cloud-fan December 20, 2024 23:19
@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in 08675b1 Dec 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants