[SPARK-49829][SS] Revise the optimization on adding input to state store in stream-stream join (correctness fix) #48297

HeartSaVioR · 2024-09-30T02:49:13Z

What changes were proposed in this pull request?

The PR proposes to revise the optimization on adding input to state store in stream-stream join.

Why are the changes needed?

Here is the logic of optimization before this PR:

spark/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamingSymmetricHashJoinExec.scala

Lines 671 to 677 in 039fd13

    
           val isLeftSemiWithMatch = 
        
             joinType == LeftSemi && joinSide == LeftSide && iteratorNotEmpty 
        
           // Add to state store only if both removal predicates do not match, 
        
           // and the row is not matched for left side of left semi join. 
        
           val shouldAddToState = 
        
             !stateKeyWatermarkPredicateFunc(key) && !stateValueWatermarkPredicateFunc(thisRow) && 
        
             !isLeftSemiWithMatch

        val isLeftSemiWithMatch =
          joinType == LeftSemi && joinSide == LeftSide && iteratorNotEmpty
        // Add to state store only if both removal predicates do not match,
        // and the row is not matched for left side of left semi join.
        val shouldAddToState =
          !stateKeyWatermarkPredicateFunc(key) && !stateValueWatermarkPredicateFunc(thisRow) &&
          !isLeftSemiWithMatch

The optimization was added when multiple stateful operators wasn't supported. The criteria of both removal predicates do not match means the input is going to be evicted in this batch - before Spark introduced multiple stateful operators, watermark for late record and watermark for eviction were same, hence the input won't be matched with the condition after filtering out late records (Not sure about the edge case this condition was dealing with.)

After multiple stateful operators, watermark for late record and watermark for eviction are no longer the same. (watermark for late record is the watermark for eviction in prior batch - consider this as we advance the watermark "after processing all inputs", not before processing input) That said, input can be determined as not late, and can be evicted at the same batch. The above condition has to reflect this change but it was missed, hence having correctness issues on the report.

There are two major issues with the missing:

missing to add the input to state store in left side prevents the input on the right side to match with "that" input. Even though the input is going to be evicted in this batch, there could be still inputs on the right side in this batch which can match with that input.
missing to add the input to state store prevents that input to produce unmatched (null-outer) output, as we produce unmatched output during the eviction of state.

Does this PR introduce any user-facing change?

Yes, there are correctness issues among stream-stream join, especially when the output of the stateful operator is provided as input of stream-stream join. The correctness issue is fixed with the PR.

How was this patch tested?

New UTs.

Was this patch authored or co-authored using generative AI tooling?

No.

…ore in stream-stream join (correctness fix)

dongjoon-hyun

Thank you, @HeartSaVioR .

Is this correctness issue introduced via SPARK-32862 at Apache Spark 3.1.0?

[SPARK-32862][SS] Left semi stream-stream join #30076

cc @xuanyuanking, @viirya too from #30076

neilramaswamy · 2024-09-30T19:27:50Z

...src/main/scala/org/apache/spark/sql/execution/streaming/StreamingSymmetricHashJoinExec.scala

-        val shouldAddToState =
-          !stateKeyWatermarkPredicateFunc(key) && !stateValueWatermarkPredicateFunc(thisRow) &&
-          !isLeftSemiWithMatch


I imagine that you considered constructing these predicates (e.g. stateKeyWatermarkPredicateFunc and stateValueWatermarkPredicateFunc) based off the watermark for late events, not the watermark for eviction. I spent some time working this out myself, and I know the subtle reason why this won't work, but I wanted to verify that you also considered this.

It's neither watermark for late event and watermark for eviction - stream-stream join has its own "state watermark", which is going to be an "output watermark" for stream-stream join. The predicate is relying on state watermark.

Right, but I had initially wondered why we can't construct the state watermark predicates using the watermark for late events.

For example, with the join predicate L > R + 10, we construct the state watermark to be L <= watermark_for_eviction(R) + 10, and I had initially thought that the fix for this correctness bug was L <= watermark_for_late_events(R) + 10.

But it's not. Here's my reasoning. Let the watermark for late events be WM_L, and let the watermark for eviction be WM_E. For a given side (WLOG, the left), the state watermark will be that L <= WM_E + k, for some positive or negative quantity k.

Assume k is non-negative. In that case, then WM_L <= WM_E <= WM_E + k. If the record is less than WM_L, it will be dropped. If it is greater than WM_L and less than WM_E + k, then it will be evicted and then have a null output. If it is greater than or equal to WM_E + k, then it might join in the future, so we need to keep it in state. The quantity WM_L + k, the state watermark using the late events watermark, doesn't actually have any real meaning, so we don't need to use it here.

Things work out similarly if k is negative, but I'll exclude that for brevity.

Yes, just simply saying, the reason we break down watermark for late event and eviction applies to stream-stream join, except the fact we'll need to buffer event longer than watermark based on join condition, hence necessity of state watermark (and it's related to eviction).

neilramaswamy · 2024-09-30T19:32:37Z

...src/main/scala/org/apache/spark/sql/execution/streaming/StreamingSymmetricHashJoinExec.scala

+        //   and the join type is left semi.
+        //   For other cases, the input should be added, including the case it's going to be evicted
+        //   in this batch. It hasn't yet evaluated with inputs from right side "for this batch".
+        //   Learn about how the each side figures out the matches from other side.


I don't understand what this line means. Can you clarify?

We do not build a hashmap separately to match both sides and completely rely on state store. The way each side figures out the match is, looking into state store of other side. And there is a sequence of doing this, left side, and then right side.

That said, when the operator seeks for match from left side, right side is yet to be handled, hence left side can only see the right side for "prior batches". We take care of match in current batch during evaluating "right side", assuming that we put the input of left side into state store. If we skip adding the input into state store for left side, we are missing possible matches rows (correctness issue).

If the classdoc covers this I should just say read through classdoc. Do you get how it works from reading through classdoc?

I understand the semantics, but I just didn't understand what your comment meant by "Learn about how...". Your latest commit addresses my concern.

neilramaswamy · 2024-09-30T19:34:18Z

...src/main/scala/org/apache/spark/sql/execution/streaming/StreamingSymmetricHashJoinExec.scala

+            // if the input is producing "unmatched row" in this batch
+            (
+              (joinType == RightOuter && !iteratorNotEmpty) ||
+                (joinType == FullOuter && !iteratorNotEmpty)


In the case of a left anti join, isn't it also possible that we would evict this record (and !iteratorNotEmpty), so we need to add it to state to produce it?

We don't support left anti join in streaming. supported: inner, left/right outer, full outer, left semi. Please check StreamingSymmetricHashJoinExec.

neilramaswamy · 2024-09-30T19:37:03Z

@dongjoon-hyun, this issue has been around since the time that we added multiple stateful operators to Structured Streaming. The Left-semi join logic that you linked previously is correct, and Jungtaek preserves that behavior in this PR.

dongjoon-hyun · 2024-09-30T19:59:55Z

To @neilramaswamy , could you provide the affected version list? Actually, I'm tracking it to update SPARK-49829 Affected Version field correctly. Currently it claims 3.5.0+ as the affected version. To be clear, which Spark version (or JIRA ID) do you mean exactly?

@dongjoon-hyun, this issue has been around since the time that we added multiple stateful operators to Structured Streaming. The Left-semi join logic that you linked previously is correct, and Jungtaek preserves that behavior in this PR.

HeartSaVioR · 2024-09-30T20:49:03Z

https://issues.apache.org/jira/browse/SPARK-40925
It was 3.4.0, sorry I forgot that there was a long time between we fixed a bug to support multiple stateful operators partially and we fixed the watermark mechanism to support stream-stream join followed by stateful operator. This is about former.

dongjoon-hyun · 2024-09-30T21:19:54Z

Thank you, @HeartSaVioR .

neilramaswamy · 2024-10-01T06:08:51Z

Fix makes sense, will thoroughly review tests shortly.

xuanyuanking · 2024-10-01T22:58:16Z

...src/main/scala/org/apache/spark/sql/execution/streaming/StreamingSymmetricHashJoinExec.scala

+          isNotEvictingInThisBatch ||
+            // if the input is producing "unmatched row" in this batch
+            (
+              (joinType == RightOuter && !iteratorNotEmpty) ||


So compared to the original logic, this PR specifically adds the RightOuter and FullOuter join types for the empty iterator case in the shouldAddToState scenario. Is my understanding correct?

No, the change is following:

For left side, we store the new input row into state store despite the fact the row will be evicted in this batch (regardless of join type). This is required because it is still yet to be checked for match with new input rows on the right side "in this batch".

For right side, we don't strictly need to store all new input row into state store, but we still need to store them if they are producing "unmatched" output (right/full outer), because we are relying on state eviction to produce unmatched output.

xuanyuanking · 2024-10-01T23:00:29Z

sql/core/src/test/scala/org/apache/spark/sql/streaming/MultiStatefulOperatorsSuite.scala

@@ -878,6 +878,60 @@ class MultiStatefulOperatorsSuite
    testOutputWatermarkInJoin(join3, input1, -40L * 1000 - 1)
  }

+  // NOTE: This is the revise of the reproducer in SPARK-45637. CREDIT goes to @andrezjzera.


IIRC, if we want to add credit information, we can amend the PR commits to include a co-author. This way, we don’t need to add the credit information in the code comments.

Though the approach of "co-authorship" is not fine-grained one e.g. the credit is only for test, I can do that instead if it's more comfortable to people. Also I don't see we give CREDIT like this way, so probably the preference is what you proposed.

xuanyuanking · 2024-10-01T23:00:46Z

sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingJoinSuite.scala

+  // scalastyle:off line.size.limit
+  // DISCLAIM: This is a revision of below test, which was a part of report in the dev mailing
+  // list. CREDIT goes to @andrezjzera.
+  // https://github.com/andrzejzera/spark-bugs/blob/abae7a3839326a8eafc7516a51aca5e0c79282a6/spark-3.5/src/test/scala/OuterJoinTest.scala#L86C3-L167C4


github-actions bot added SQL STRUCTURED STREAMING labels Sep 30, 2024

[SPARK-49829][SS] Revise the optimization on adding input to state st…

6ddd6ef

…ore in stream-stream join (correctness fix)

HeartSaVioR force-pushed the SPARK-49829 branch from d50a906 to 6ddd6ef Compare September 30, 2024 02:57

HeartSaVioR added 2 commits September 30, 2024 12:25

silly fix

e75e28f

scalastyle fix

3c39c48

dongjoon-hyun reviewed Sep 30, 2024

View reviewed changes

neilramaswamy reviewed Sep 30, 2024

View reviewed changes

reword

17ac1af

xuanyuanking reviewed Oct 1, 2024

View reviewed changes

Give CREDIT to Andrzej Zera for reproducers

3d1dd36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-49829][SS] Revise the optimization on adding input to state store in stream-stream join (correctness fix) #48297

[SPARK-49829][SS] Revise the optimization on adding input to state store in stream-stream join (correctness fix) #48297

HeartSaVioR commented Sep 30, 2024 •

edited

Loading

dongjoon-hyun left a comment •

edited

Loading

neilramaswamy Sep 30, 2024

HeartSaVioR Sep 30, 2024

neilramaswamy Oct 1, 2024 •

edited

Loading

neilramaswamy Oct 1, 2024

HeartSaVioR Oct 1, 2024

neilramaswamy Sep 30, 2024

HeartSaVioR Sep 30, 2024 •

edited

Loading

HeartSaVioR Sep 30, 2024

neilramaswamy Oct 1, 2024

neilramaswamy Sep 30, 2024

HeartSaVioR Sep 30, 2024

neilramaswamy commented Sep 30, 2024

dongjoon-hyun commented Sep 30, 2024

HeartSaVioR commented Sep 30, 2024

dongjoon-hyun commented Sep 30, 2024

neilramaswamy commented Oct 1, 2024

xuanyuanking Oct 1, 2024

HeartSaVioR Oct 1, 2024 •

edited

Loading

xuanyuanking Oct 1, 2024

HeartSaVioR Oct 1, 2024

xuanyuanking Oct 1, 2024

	val isLeftSemiWithMatch =
	joinType == LeftSemi && joinSide == LeftSide && iteratorNotEmpty
	// Add to state store only if both removal predicates do not match,
	// and the row is not matched for left side of left semi join.
	val shouldAddToState =
	!stateKeyWatermarkPredicateFunc(key) && !stateValueWatermarkPredicateFunc(thisRow) &&
	!isLeftSemiWithMatch

[SPARK-49829][SS] Revise the optimization on adding input to state store in stream-stream join (correctness fix) #48297

Are you sure you want to change the base?

[SPARK-49829][SS] Revise the optimization on adding input to state store in stream-stream join (correctness fix) #48297

Conversation

HeartSaVioR commented Sep 30, 2024 • edited Loading

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

dongjoon-hyun left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

neilramaswamy Oct 1, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

HeartSaVioR Sep 30, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

neilramaswamy commented Sep 30, 2024

dongjoon-hyun commented Sep 30, 2024

HeartSaVioR commented Sep 30, 2024

dongjoon-hyun commented Sep 30, 2024

neilramaswamy commented Oct 1, 2024

Choose a reason for hiding this comment

HeartSaVioR Oct 1, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

HeartSaVioR commented Sep 30, 2024 •

edited

Loading

dongjoon-hyun left a comment •

edited

Loading

neilramaswamy Oct 1, 2024 •

edited

Loading

HeartSaVioR Sep 30, 2024 •

edited

Loading

HeartSaVioR Oct 1, 2024 •

edited

Loading