Streamline Expansion of ReasonSets #330

okennedy · 2019-06-26T14:59:04Z

As discussed in the writeup of Variable Generating Relational Algebra and in this paper, Mimir uses special Expression objects called VGTerms (and DataWarnings) to encode uncertainty about a value being computed over. These objects 'tag' results with a warning. Every tag is associated with a 3-part identifier:
( model:String, index:Int, key:Seq[PrimitiveValue] )
Identifiers are generated as part of query processing. The model and index fields are static: They are a fixed part of the query. The key field is generated during query processing (a typical use case is to pass a ROWID).

Specific identifiers are dropped during normal (BestGuess) query processing, and the query tracks only whether a given row or cell has been tagged (and not which specific warning tagged it). To understand what the warning is, a user needs to issue a separate query:

ANALYZE SELECT ....

This query returns a sequence of Reason objects, which include a human-readable explanation, and a process for resolving the error.

Analyze queries is handled by AnalyzeUncertainty. Handling these queries is a two stage process. First is AnalyzeUncertainty.explainSubsetWithoutOptimizing. This is a static pass over the query (i.e., no data is touched) that identifies every VG-Term (and DataWarning) in the query. The static pass returns a collection of ReasonSets. Every ReasonSet includes the static components of the identifier (model, index), as well as a relational algebra query that generates the dynamic components (the fields of key).

The second pass (presently happens in the front-end) expands out each ReasonSet into a collection of Reason objects by executing the query for each ReasonSet, generating the appropriate keys, and filling those in to Reasons. Typical usage is through the take function of ReasonSet, which returns a limited number of Reasons.

This ReasonSet expansion process includes a significant amount of redundant computation, as the ReasonSet queries are generally generated from the same source query. The goal of this project would be to streamline this expansion process, through materialized views, parallel execution, or inlining multiple queries into the same single-pass query.

The text was updated successfully, but these errors were encountered:

okennedy added enhancement backend explain/analyze 662 Project labels Jun 26, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Streamline Expansion of ReasonSets #330

Streamline Expansion of ReasonSets #330

okennedy commented Jun 26, 2019 •

edited

Loading

Streamline Expansion of ReasonSets #330

Streamline Expansion of ReasonSets #330

Comments

okennedy commented Jun 26, 2019 • edited Loading

okennedy commented Jun 26, 2019 •

edited

Loading