Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite SQL builders, make substantial project updates. #586

Merged
merged 12 commits into from
Nov 15, 2024
Merged

Rewrite SQL builders, make substantial project updates. #586

merged 12 commits into from
Nov 15, 2024

Conversation

jheer
Copy link
Member

@jheer jheer commented Nov 8, 2024

  • Breaking: Rewrite mosaic-sql package to use a complete AST formulation. Add additional SQL helper methods.
  • Breaking: Drop agg SQL expression helper. We now perform automatic analysis to determine if an expression is an aggregate expression.
  • Breaking: Remove automatic internal casts, such as casting count() aggregates to 32-bit integers. As we dropped Arrow-JS in favor of Flechette, Arrow parsing now gracefully handles 64-bit integers, decimals, and other type conversions.
  • Move SQL transforms for binning and M4 to mosaic-sql package in a new transforms folder.
  • Update pre-aggregation optimizations to support arbitrary aggregate expressions, not just individual aggregate function calls. We use the SQL facilities to rewrite expressions for more complex optimized aggregations.
  • Update pre-aggregation optimizations to support aggregates with FILTER clauses.
  • Refactor pre-aggregation optimizations, moving them to the new mosaic-core/preagg folder.
  • Update plots, inputs, and spec parsing to use new SQL utilities. SQL expressions no longer provide column dependency lists, param listeners, or idiosyncratic annotations. Instead, needed metadata is now extracted using visitor methods that walk the SQL expression AST. All query generator methods now create full, analyzable ASTs using our updated SQL builders.
  • Improved TypeScript types and tests in mosaic-sql and mosaic-core.
  • Basic documentation updates (though more extensive additions needed later).

@jheer jheer requested a review from domoritz November 8, 2024 22:21
@jheer jheer changed the title Rewrite SQL builders, update project. Rewrite SQL builders, make substantial project updates. Nov 8, 2024
@domoritz
Copy link
Member

domoritz commented Nov 9, 2024

Sweet. I'll take a deeper look soon.

@@ -4184,6 +4184,10 @@
"resolved": "packages/sql",
"link": true
},
"node_modules/@uwdata/mosaic-sql-old": {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, will fix!

import { walk } from './walk.js';

const aggrRegExp = new RegExp(`^(${aggregateNames.join('|')})`);
const funcRegExp = /(\\'|\\"|"(?:\\"|[^"])*"|'(?:\\'|[^'])*'|\$\w+|\w+\()/g;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be more readable with comments.

Suggested change
const funcRegExp = /(\\'|\\"|"(?:\\"|[^"])*"|'(?:\\'|[^'])*'|\$\w+|\w+\()/g;
const funcRegExp = /(
\\' | // Matches an escaped single quote (e.g., \')
\\" | // Matches an escaped double quote (e.g., \")
"(?:\\"|[^"])*" | // Matches a double-quoted string, allowing for escaped quotes (e.g., "text with \"escaped\" quotes")
'(?:\\'|[^'])*' | // Matches a single-quoted string, allowing for escaped quotes (e.g., 'text with \'escaped\' quotes')
\$\w+ | // Matches a variable starting with $ followed by word characters (e.g., $variableName)
\w+\( // Matches a function name, capturing any word characters followed by an opening parenthesis (e.g., functionName()
)/g;

const value = node[props[i]];
if (Array.isArray(value)) {
const m = value.length;
for (let j = 0; j < m; ++j) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be a for...of

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could, but I wanted to keep the loops tight in cases of larger queries/expressions to walk. Using iterators adds non-trivial overhead. Maybe wouldn't really matter in this case, but I prefer to default to the more performant case for internals like this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice to have this outside of vgplot now

@jheer jheer merged commit 31ed7ef into main Nov 15, 2024
3 checks passed
@jheer jheer deleted the jh/sql branch November 15, 2024 19:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants