[CALCITE-6363] Introduce a rule to derive more filters from inner join … #3760

frostruan · 2024-04-13T08:31:05Z

https://issues.apache.org/jira/browse/CALCITE-6363

sonarcloud · 2024-04-13T11:57:50Z

Quality Gate passed

Measures
0 Security Hotspots
84.5% Coverage on New Code
0.0% Duplication on New Code

jamesstarr

I would prefer a more complete solution that handled things like using predicate pull ups and handling the null producing joins. If a partial solution is accepted, then it might discourage a more complete solution down the line due to the legacy rule providing more base for expanding the solution.

jamesstarr · 2024-04-17T16:59:50Z

core/src/main/java/org/apache/calcite/rel/rules/JoinDeriveEquivalenceFilterRule.java

+    final RexNode newCondition =
+        deriveEquivalenceCondition(simplify, rexBuilder, originalCondition);
+
+    if (arePredicatesEquivalent(rexBuilder, simplify, originalCondition, newCondition)) {


This is not sufficient to prevent infinite loops once generalized to pull up predicates.

Would you mind give me an example ? Can't imagine situation where an infinite loop would occur now. :)

jamesstarr · 2024-04-17T17:05:02Z

core/src/main/java/org/apache/calcite/rel/rules/JoinDeriveEquivalenceFilterRule.java

+    super(config);
+  }
+
+  @Override public void onMatch(RelOptRuleCall call) {


This should use the RelMetaDataQuery for extracting filters from below. Please look at RelMetadataQuery::getPulledUpPredicates usage in ReduceExpressionsRule.

Thanks for pointing out. I'll check and learn.

jamesstarr · 2024-04-17T17:06:54Z

core/src/main/java/org/apache/calcite/rel/rules/JoinDeriveEquivalenceFilterRule.java

+      return;
+    }
+
+    final Filter newFilter = filter.copy(filter.getTraitSet(), filter.getInput(), newCondition);


I think it preferable to generate:

FILTER JOIN FILTER .... FILTER ....

After a new Filter is generated, FilterJoinRule will push down the predicate.

In fact, I agree with you and the current implementation is a compromise. In terms of the order of rule application, the rules in calcite are order-insensitive, which allows the rules to remain completely independent without any dependencies. But if the predicate is pushed down in this rule, then I think this rule will do too many things. I want to keep the rule as simple and atomic as possible, focusing on one thing, so Just generating a new Filter ends here.

If in the future, we can allow rules to declare their own application order (such as applying before/after other rules), and then perform topological sorting on the rules, then it may be enough to just generate a new Filter.

If we had a better extraction method, then we would see an infinite loop. This is my concern with such a partial solution, it will be difficult to build on top of.

SELECT * FROM t1, t2 WHERE t1.c1 = t2.c1 AND ((t1.c1, t2.c1) IN ((1, 2), (3, 4), (5, 6), (7, 8), (9, 10)) OR (t1.c1, t1.c2) IN ((3, 4), (5, 6), (7, 8))

Given the above example ideally you would get something like the following tree:

FILTER (t1.c1, t2.c1) IN ((1, 2), (3, 4), (5, 6), (7, 8), (9, 10)) OR (t1.c1, t1.c2) IN ((3, 4), (5, 6), (7, 8)) JOIN on t1.c1 = t2.c1 FILTER t1.c1 IN (1, 3, 5, 7, 9) SCAN t1 SCAN t2

What I see here reminds me of HIVE-25758 and similar bugs around JoinPushTransitivePredicatesRule, for which you can find more details in this slide deck from a Calcite meetup from last year: https://www.slideshare.net/slideshow/debugging-planning-issues-using-calcites-builtin-loggers/256567632 (slides 45-53).

As @jamesstarr suggests, once you can pull up more predicates, the burden of converging falls onto RexSimplify, if it fails at simplifying the "redundant" part, the predicate will always be identified as a new predicate that you will push from one side of the join to the other, pushed down and merged with existing predicates, which will again be identified as new predicates and you go into a loop.

I agree that having extra power here could be dangerous, but as long as it's not part of the core rules and it stays optional, it would be good to have it IMO.

jamesstarr · 2024-04-17T17:10:04Z

core/src/main/java/org/apache/calcite/rel/rules/JoinDeriveEquivalenceFilterRule.java

+import java.util.stream.Collectors;
+
+/**
+ *  Planner rule that derives more equivalent predicates from inner


You can transitively generate predicates for the null generating sides or left and right joins.

Ok. Let me try this.

jamesstarr · 2024-04-17T17:19:29Z

core/src/main/java/org/apache/calcite/rel/rules/JoinDeriveEquivalenceFilterRule.java

+    call.transformTo(newFilter);
+
+    // after derivation, the original filter can be pruned
+    call.getPlanner().prune(filter);


I am pretty sure we do not want to do this. If their is a large tree, then this could cause problems.

ok, this might be too radical.

Does this break anything if removed? Why would it be to radical?

jamesstarr · 2024-04-17T17:22:27Z

core/src/main/java/org/apache/calcite/rel/rules/JoinDeriveEquivalenceFilterRule.java

+ */
+
+@Value.Enclosing
+public class JoinDeriveEquivalenceFilterRule


This name is misleading since it is not just equivalency, no?

Yes, this is indeed a bit misleading. I originally wanted to express the derivation of rules based on equivalent rewriting. I will rename it.

jamesstarr · 2024-04-17T17:30:18Z

core/src/main/java/org/apache/calcite/rex/RexUtil.java

+   * representation lexicographic order, which allows constant to always be on the
+   * right side of the expression. See {@link RexNormalize#reorderOperands} for details.
+   */
+  public static RexNode canonizeNode(RexBuilder rexBuilder, RexNode expression) {


I think it preferable to have helper functions for interacting with the existing nodes then creating a copy.

Ok. I'll fix this.

frostruan · 2024-04-18T03:59:20Z

REALLY appreciate your review. @jamesstarr It's very helpful. I'll address these as soon as possible.

…n condition

frostruan · 2024-07-17T13:22:32Z

Hi all
I tried to re-implement predicate derivation in a new commit. Compared with a formal PR, it is more like a POC now. Looking forward to your suggestions.

Why this change was been made

As I commented on https://issues.apache.org/jira/browse/CALCITE-6363, predicate inference is not available in VolcanoPlanner currently, so this PR tries to propose a temporary solution to help implement predicate inference.

Main changes

Introduce a new method in RelOptUtil to implement predicate inference, which basically copies RelMdPredicates.JoinConditionBasedPredicateInference, but makes some simplifications as follows:
1.1 Because the predicate is passed directly from the join, rather than pulled up from the filter node after the predicate is pushed down, there is no need to convert RexInputRef
1.2 FilterJoinRule will classify the generated predicates, so the new method does not need to classify the predicates, and just returns the generated predicates directly
Call this method in FilterJoinRule, just after simplifying Outer Joins
Introduced a new configuration item in FilterJoinRule to determine whether to enable predicate inference, and turned it off by default
Added some unit tests

Impact

Predicate inference is turned off by default, so it has no impact on the current logic
Applicable to Inner Join, Outer Join and Semi-Anti Join

frostruan force-pushed the calcite-6363 branch 2 times, most recently from e2a0066 to fd26369 Compare April 13, 2024 11:38

jamesstarr reviewed Apr 17, 2024

View reviewed changes

[CALCITE-6363] Introduce a rule to derive more filters from inner joi…

fa1d7a8

…n condition

frostruan force-pushed the calcite-6363 branch from fd26369 to fa1d7a8 Compare July 17, 2024 13:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CALCITE-6363] Introduce a rule to derive more filters from inner join … #3760

[CALCITE-6363] Introduce a rule to derive more filters from inner join … #3760

frostruan commented Apr 13, 2024

sonarcloud bot commented Apr 13, 2024

jamesstarr left a comment

jamesstarr Apr 17, 2024

frostruan Apr 18, 2024

jamesstarr Apr 26, 2024

jamesstarr Apr 17, 2024

frostruan Apr 18, 2024

jamesstarr Apr 17, 2024

frostruan Apr 18, 2024

jamesstarr Apr 26, 2024 •

edited

Loading

asolimando Apr 27, 2024

jamesstarr Apr 17, 2024

frostruan Apr 18, 2024

jamesstarr Apr 17, 2024

frostruan Apr 18, 2024

jamesstarr Apr 26, 2024

jamesstarr Apr 17, 2024

frostruan Apr 18, 2024

jamesstarr Apr 17, 2024

frostruan Apr 18, 2024

frostruan commented Apr 18, 2024

frostruan commented Jul 17, 2024

[CALCITE-6363] Introduce a rule to derive more filters from inner join … #3760

Are you sure you want to change the base?

[CALCITE-6363] Introduce a rule to derive more filters from inner join … #3760

Conversation

frostruan commented Apr 13, 2024

sonarcloud bot commented Apr 13, 2024

Quality Gate passed

jamesstarr left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jamesstarr Apr 26, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

frostruan commented Apr 18, 2024

frostruan commented Jul 17, 2024

Why this change was been made

Main changes

Impact

jamesstarr Apr 26, 2024 •

edited

Loading