You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As discussed in the writeup of Variable Generating Relational Algebra and in this paper, Mimir uses special Expression objects called VGTerms (and DataWarnings) to encode uncertainty about a value being computed over. These objects 'tag' results with a warning. Every tag is associated with a 3-part identifier: ( model:String, index:Int, key:Seq[PrimitiveValue] )
Identifiers are generated as part of query processing. The model and index fields are static: They are a fixed part of the query. The key field is generated during query processing (a typical use case is to pass a ROWID).
Specific identifiers are dropped during normal (BestGuess) query processing, and the query tracks only whether a given row or cell has been tagged (and not which specific warning tagged it). To understand what the warning is, a user needs to issue a separate query:
ANALYZE SELECT ....
This query returns a sequence of Reason objects, which include a human-readable explanation, and a process for resolving the error.
Analyze queries is handled by AnalyzeUncertainty. Handling these queries is a two stage process. First is AnalyzeUncertainty.explainSubsetWithoutOptimizing. This is a static pass over the query (i.e., no data is touched) that identifies every VG-Term (and DataWarning) in the query. The static pass returns a collection of ReasonSets. Every ReasonSet includes the static components of the identifier (model, index), as well as a relational algebra query that generates the dynamic components (the fields of key).
The second pass (presently happens in the front-end) expands out each ReasonSet into a collection of Reason objects by executing the query for each ReasonSet, generating the appropriate keys, and filling those in to Reasons. Typical usage is through the take function of ReasonSet, which returns a limited number of Reasons.
This ReasonSet expansion process includes a significant amount of redundant computation, as the ReasonSet queries are generally generated from the same source query. The goal of this project would be to streamline this expansion process, through materialized views, parallel execution, or inlining multiple queries into the same single-pass query.
The text was updated successfully, but these errors were encountered:
As discussed in the writeup of Variable Generating Relational Algebra and in this paper, Mimir uses special Expression objects called VGTerms (and DataWarnings) to encode uncertainty about a value being computed over. These objects 'tag' results with a warning. Every tag is associated with a 3-part identifier:
( model:String, index:Int, key:Seq[PrimitiveValue] )
Identifiers are generated as part of query processing. The
model
andindex
fields are static: They are a fixed part of the query. Thekey
field is generated during query processing (a typical use case is to pass aROWID
).Specific identifiers are dropped during normal (BestGuess) query processing, and the query tracks only whether a given row or cell has been tagged (and not which specific warning tagged it). To understand what the warning is, a user needs to issue a separate query:
This query returns a sequence of Reason objects, which include a human-readable explanation, and a process for resolving the error.
Analyze queries is handled by AnalyzeUncertainty. Handling these queries is a two stage process. First is AnalyzeUncertainty.explainSubsetWithoutOptimizing. This is a static pass over the query (i.e., no data is touched) that identifies every VG-Term (and DataWarning) in the query. The static pass returns a collection of ReasonSets. Every ReasonSet includes the static components of the identifier (
model
,index
), as well as a relational algebra query that generates the dynamic components (the fields ofkey
).The second pass (presently happens in the front-end) expands out each ReasonSet into a collection of Reason objects by executing the query for each ReasonSet, generating the appropriate
key
s, and filling those in to Reasons. Typical usage is through thetake
function of ReasonSet, which returns a limited number of Reasons.This ReasonSet expansion process includes a significant amount of redundant computation, as the ReasonSet queries are generally generated from the same source query. The goal of this project would be to streamline this expansion process, through materialized views, parallel execution, or inlining multiple queries into the same single-pass query.
The text was updated successfully, but these errors were encountered: