Enhance incremental computation support in Texera #2165

zuozhiw · 2023-09-26T06:18:08Z

This PR enhances incremental computation support in Texera, including:

Added an incremental aggregation framework. Specifically:

PartialAggregateOpExec and FinalAggregateOpExec are updated to use incremental computation. They will perdoically emit partial results to downstream.
Aggregate and LineChart operators now use the new aggregation framework. Other aggregate-based visualizations are not using it as they are now implemented with Python UDFs and HTML visualizations.
WordCloud is not using the new framework, as WordCloud is a special top-k aggregation.

Added a general incremental computation option for all operators. Specifically:

Added a new option supportRetractableInput to indicate whether an operator support retractions as input tuples.
Added a new incremental computation enforcer that rewrites the workflow based on incremental computation requirements. It propagate the incremental properties and adds a "consolidate" operator if necessary.

Added a simple incremental join operator.

For detailed technical presentation on incremental computation, see this slide and descriptions in this PR

Yicong-Huang

The PR looks good and clean! Left some small comments in code. Although, I am not very sure about the new incremental join operator, what is it behavior if the inputs are already retractable?

Yicong-Huang · 2023-10-01T04:18:37Z

core/amber/src/main/scala/edu/uci/ics/texera/workflow/common/workflow/WorkflowCompiler.scala

+    var rewrittenLogicalPlan =
      WorkflowCacheRewriter.transform(logicalPlan, opResultStorage, opsToReuseCache)
    rewrittenLogicalPlan.operatorMap.values.foreach(initOperator)

+    // perform rewrite to enforce progressive computation constraints
+    rewrittenLogicalPlan = ProgressiveRetractionEnforcer.enforceDelta(rewrittenLogicalPlan, context)


I suggest creating a new variable name for each step of the rewrite, as they are rewrites with different purposes.

Yicong-Huang · 2023-11-16T15:55:31Z

...in/scala/edu/uci/ics/texera/workflow/common/operators/aggregate/PartialAggregateOpExec.scala

+  private def shouldEmitOutput(): Boolean = {
+    System.currentTimeMillis - lastUpdatedTime > UPDATE_INTERVAL_MS
+  }
+
+  private def emitOutputAndResetState(): scala.Iterator[Tuple] = {
+    lastUpdatedTime = System.currentTimeMillis
+    val resultIterator = getPartialOutputs()
+    this.partialObjectsPerKey = new mutable.HashMap[List[Object], List[Object]]()
+    resultIterator
+  }


I see similar code for partial and final aggregate operators to do time-based snapshots to push partial results out. If the time-based snapshot is a universal strategy for incremental operators to push out partial results, is it better to make it a standard framework?

Yicong-Huang · 2023-11-16T15:58:13Z

core/amber/src/main/scala/edu/uci/ics/texera/workflow/common/tuple/Tuple.java

@@ -272,9 +272,9 @@ public BuilderV2 add(String attributeName, AttributeType attributeType, Object f
         */
        public BuilderV2 addSequentially(Object[] fields) {
            checkNotNull(fields);
-            checkSchemaMatchesFields(schema.getAttributes(), Lists.newArrayList(fields));


I think we need such an assertion for the normal tuple fields. if we need to add new fields (e.g., retraction or not), we can treat it separately? If so, I can do it in a future PR.

Yicong-Huang · 2023-11-16T15:59:22Z

...c/main/scala/edu/uci/ics/texera/workflow/common/workflow/ProgressiveRetractionEnforcer.scala

+
+import scala.collection.mutable.ArrayBuffer
+
+object ProgressiveRetractionEnforcer {


Could you add some doc to explain this enforcer's duty?

Yicong-Huang · 2023-11-16T16:00:50Z

.../main/scala/edu/uci/ics/texera/workflow/operators/aggregate/SpecializedAggregateOpDesc.scala

@@ -71,6 +72,7 @@ class SpecializedAggregateOpDesc extends AggregateOpDesc {
    }
    Schema
      .newBuilder()
+      .add(ProgressiveUtils.insertRetractFlagAttr)


Alternatively, we can have a Builder.allowRetract() to add this attribute internally for users?

Yicong-Huang · 2023-11-16T16:03:48Z

...er/src/main/scala/edu/uci/ics/texera/workflow/operators/hashJoin/IncrementalJoinOpExec.scala

+    val builder = Tuple
+      .newBuilder(operatorSchemaInfo.outputSchemas(0))
+      .add(left)


Is there a case where the input left tuples and/or right tuples are already supporting retraction?

Yicong-Huang · 2023-12-19T19:37:04Z

will revisit after complier refactoring.

zuozhiw added 18 commits June 5, 2023 21:20

wip

911e8b0

merge

3276f29

support workflows without a view result operator

d2eecc5

update

7b60978

update

e5e89e9

wip

c708787

Merge branch 'master' into zuozhi-remove-sink

4c2e670

wip

6d876ed

complete

2089538

complete cache

5cc37aa

wip

59e90f9

wip

b4c8116

wip

eb3402b

merge

cc06e00

clean up

8563bd9

fix format

6ea5dec

format

4bc62ab

add comments

9296f7c

Yicong-Huang assigned zuozhiw Sep 27, 2023

zuozhiw added 2 commits September 29, 2023 00:16

update

1dc1246

format

634d477

zuozhiw requested review from Yicong-Huang and shengquan-ni September 29, 2023 07:17

remove unrelated change

50cd751

zuozhiw changed the title ~~Enhance incremental computation support in Texera [WIP]~~ Enhance incremental computation support in Texera Sep 29, 2023

Merge branch 'master' into zuozhi-incremental

2fc718d

Yicong-Huang approved these changes Nov 16, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enhance incremental computation support in Texera #2165

Enhance incremental computation support in Texera #2165

zuozhiw commented Sep 26, 2023 •

edited

Loading

Yicong-Huang left a comment

Yicong-Huang Oct 1, 2023

Yicong-Huang Nov 16, 2023

Yicong-Huang Nov 16, 2023

Yicong-Huang Nov 16, 2023

Yicong-Huang Nov 16, 2023

Yicong-Huang Nov 16, 2023

Yicong-Huang commented Dec 19, 2023


		import scala.collection.mutable.ArrayBuffer

		object ProgressiveRetractionEnforcer {

Enhance incremental computation support in Texera #2165

Are you sure you want to change the base?

Enhance incremental computation support in Texera #2165

Conversation

zuozhiw commented Sep 26, 2023 • edited Loading

Yicong-Huang left a comment

Choose a reason for hiding this comment

Yicong-Huang Oct 1, 2023

Choose a reason for hiding this comment

Yicong-Huang Nov 16, 2023

Choose a reason for hiding this comment

Yicong-Huang Nov 16, 2023

Choose a reason for hiding this comment

Yicong-Huang Nov 16, 2023

Choose a reason for hiding this comment

Yicong-Huang Nov 16, 2023

Choose a reason for hiding this comment

Yicong-Huang Nov 16, 2023

Choose a reason for hiding this comment

Yicong-Huang commented Dec 19, 2023

zuozhiw commented Sep 26, 2023 •

edited

Loading