-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CALCITE-6363] Introduce a rule to derive more filters from inner join … #3760
base: main
Are you sure you want to change the base?
Conversation
e2a0066
to
fd26369
Compare
Quality Gate passedIssues Measures |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer a more complete solution that handled things like using predicate pull ups and handling the null producing joins. If a partial solution is accepted, then it might discourage a more complete solution down the line due to the legacy rule providing more base for expanding the solution.
final RexNode newCondition = | ||
deriveEquivalenceCondition(simplify, rexBuilder, originalCondition); | ||
|
||
if (arePredicatesEquivalent(rexBuilder, simplify, originalCondition, newCondition)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not sufficient to prevent infinite loops once generalized to pull up predicates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you mind give me an example ? Can't imagine situation where an infinite loop would occur now. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See here
super(config); | ||
} | ||
|
||
@Override public void onMatch(RelOptRuleCall call) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should use the RelMetaDataQuery for extracting filters from below. Please look at RelMetadataQuery::getPulledUpPredicates usage in ReduceExpressionsRule.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing out. I'll check and learn.
return; | ||
} | ||
|
||
final Filter newFilter = filter.copy(filter.getTraitSet(), filter.getInput(), newCondition); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it preferable to generate:
FILTER
JOIN
FILTER
....
FILTER
....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After a new Filter is generated, FilterJoinRule will push down the predicate.
In fact, I agree with you and the current implementation is a compromise. In terms of the order of rule application, the rules in calcite are order-insensitive, which allows the rules to remain completely independent without any dependencies. But if the predicate is pushed down in this rule, then I think this rule will do too many things. I want to keep the rule as simple and atomic as possible, focusing on one thing, so Just generating a new Filter ends here.
If in the future, we can allow rules to declare their own application order (such as applying before/after other rules), and then perform topological sorting on the rules, then it may be enough to just generate a new Filter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we had a better extraction method, then we would see an infinite loop. This is my concern with such a partial solution, it will be difficult to build on top of.
SELECT * FROM t1, t2
WHERE t1.c1 = t2.c1
AND ((t1.c1, t2.c1) IN ((1, 2), (3, 4), (5, 6), (7, 8), (9, 10))
OR (t1.c1, t1.c2) IN ((3, 4), (5, 6), (7, 8))
Given the above example ideally you would get something like the following tree:
FILTER (t1.c1, t2.c1) IN ((1, 2), (3, 4), (5, 6), (7, 8), (9, 10)) OR (t1.c1, t1.c2) IN ((3, 4), (5, 6), (7, 8))
JOIN on t1.c1 = t2.c1
FILTER t1.c1 IN (1, 3, 5, 7, 9)
SCAN t1
SCAN t2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I see here reminds me of HIVE-25758 and similar bugs around JoinPushTransitivePredicatesRule
, for which you can find more details in this slide deck from a Calcite meetup from last year: https://www.slideshare.net/slideshow/debugging-planning-issues-using-calcites-builtin-loggers/256567632 (slides 45-53).
As @jamesstarr suggests, once you can pull up more predicates, the burden of converging falls onto RexSimplify
, if it fails at simplifying the "redundant" part, the predicate will always be identified as a new predicate that you will push from one side of the join to the other, pushed down and merged with existing predicates, which will again be identified as new predicates and you go into a loop.
I agree that having extra power here could be dangerous, but as long as it's not part of the core rules and it stays optional, it would be good to have it IMO.
import java.util.stream.Collectors; | ||
|
||
/** | ||
* Planner rule that derives more equivalent predicates from inner |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can transitively generate predicates for the null generating sides or left and right joins.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. Let me try this.
call.transformTo(newFilter); | ||
|
||
// after derivation, the original filter can be pruned | ||
call.getPlanner().prune(filter); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am pretty sure we do not want to do this. If their is a large tree, then this could cause problems.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, this might be too radical.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this break anything if removed? Why would it be to radical?
*/ | ||
|
||
@Value.Enclosing | ||
public class JoinDeriveEquivalenceFilterRule |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This name is misleading since it is not just equivalency, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is indeed a bit misleading. I originally wanted to express the derivation of rules based on equivalent rewriting. I will rename it.
* representation lexicographic order, which allows constant to always be on the | ||
* right side of the expression. See {@link RexNormalize#reorderOperands} for details. | ||
*/ | ||
public static RexNode canonizeNode(RexBuilder rexBuilder, RexNode expression) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it preferable to have helper functions for interacting with the existing nodes then creating a copy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. I'll fix this.
REALLY appreciate your review. @jamesstarr It's very helpful. I'll address these as soon as possible. |
Hi all Why this change was been madeAs I commented on https://issues.apache.org/jira/browse/CALCITE-6363, predicate inference is not available in VolcanoPlanner currently, so this PR tries to propose a temporary solution to help implement predicate inference. Main changes
Impact
|
https://issues.apache.org/jira/browse/CALCITE-6363