-
Notifications
You must be signed in to change notification settings - Fork 28.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-50630][SQL] Fix GROUP BY ordinal support for pipe SQL AGGREGATE operators #49248
Conversation
cc @cloud-fan @gengliangwang this PR fixes |
// and replace it with the corresponding attribute from the child operator. | ||
case Literal(v: Int, IntegerType) if conf.groupByOrdinal => | ||
numGroupByOrdinalReferences += 1 | ||
newGroupingExpressions += UnresolvedOrdinal(numGroupByOrdinalReferences) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if we have mixed group by col and group by ordinal? e.g.
a plan with output [c1, c2, c3, c4, value]
|> aggregate sum(value) group by c1, c2, c3, 4
This should be identical to group by c1, c2, c3, c4
but with the current implementation it becomes group by c1, c2, c3, c1
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch!! This should be UnresolvedOrdinal(newAggregateExpressions.length + 1)
instead of UnresolvedOrdinal(numGroupByOrdinalReferences)
. I made this update and added a test case to cover this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @cloud-fan for your review!
// and replace it with the corresponding attribute from the child operator. | ||
case Literal(v: Int, IntegerType) if conf.groupByOrdinal => | ||
numGroupByOrdinalReferences += 1 | ||
newGroupingExpressions += UnresolvedOrdinal(numGroupByOrdinalReferences) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch!! This should be UnresolvedOrdinal(newAggregateExpressions.length + 1)
instead of UnresolvedOrdinal(numGroupByOrdinalReferences)
. I made this update and added a test case to cover this.
thanks, merging to master! |
What changes were proposed in this pull request?
This PR fixes GROUP BY ordinal support for pipe SQL AGGREGATE operators.
It adds a new
UnresolvedPipeAggregateOrdinal
expression to represent these ordinals. In this context, the ordinal refers to the one-based position of the column in the input relation. Note that this behavior is different from GROUP BY ordinals in regular SQL, wherein the ordinal refers to the one-based position of the column in the SELECT clause instead.For example:
This PR also makes a small fix for
|> UNION
(and other set operations) to prefer future pipe operators to apply on the result of the entire union, rather than binding to the right leg of the union only (to allay reported confusion during testing). For example,values (0, 1) s(x, y) |> union all values (2, 3) t(x, y) |> drop x
will succeed rather than report an error that the number of columns does not match.Why are the changes needed?
The current implementation has a bug where the ordinals are sometimes mistakenly retained as literal integers.
Does this PR introduce any user-facing change?
Yes, see above.
How was this patch tested?
This PR adds new golden file based test coverage.
Was this patch authored or co-authored using generative AI tooling?
No.