Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add AST analyzer middleware to optionally rewrite queries before execution #55

Merged

Conversation

phillipleblanc
Copy link
Collaborator

Adds an AST analyzer to the SQLExecutor trait to allow modifying the query AST before execution.

peasee and others added 2 commits September 5, 2024 10:16
* feat: Add AST analyzer middleware

* fix: Remove redundant ast_analyzer call

* chore: Variable does not need to be mutable
@phillipleblanc
Copy link
Collaborator Author

This is used in the SQLite Table Provider to perform a non-trivial rewrite of interval expressions to use the SQLite native date() or datetime() functions, since SQLite doesn't natively support intervals.

https://github.com/datafusion-contrib/datafusion-table-providers/blob/main/src/sqlite/federation.rs#L49

Copy link
Collaborator

@backkem backkem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An alternative to this would be to implement a FederationProvider directly for execution engines with partial SQL dialect support. Maybe we could look into making that more trivial at some point.

I'll accept this for now since the above would be more effort and there is a default implementation that should not get in the way.

@hozan23 hozan23 merged commit ed418c5 into datafusion-contrib:main Sep 6, 2024
7 checks passed
@phillipleblanc phillipleblanc deleted the phillip/240905-ast-analyzer branch September 6, 2024 12:56
@jonmmease
Copy link

Apologies for commenting in this old thread, but I'm working on learning about DataFusion's unparse sql generation support and datafusion-federation. Is there a reason (other than project logistics) that this kind of interval rewrite logic for sqllite should be done in an ast_analyzer step here in datafusion-table-providers rather than in the unparse logic in DataFusion itself?

I'm exploring replacing the sql-generation system we have in VegaFusion with unparse and datafusion-federation, so just trying to get a feel for where dialect customization should happen.

@hozan23
Copy link
Collaborator

hozan23 commented Oct 11, 2024

Hello @jonmmease,
There was a discussion about this in issue #61

@backkem
Copy link
Collaborator

backkem commented Oct 12, 2024

@jonmmease maybe there are already table providers in datafusion-contrib/datafusion-table-providers that meet your needs? These also implement federation. That are used if you register the federation optimizer.

@jonmmease
Copy link

Thanks @backkem, yes these table providers look great, and I'm excited to learn more about how federation works!

I'm thinking about eventually adding support for some additional dialects (like Clickhouse, Spark, MSSQL, etc) and wondering if this should primarily be contributed upstream in the unparse logic in the datafusion-sql crate, or if I should customize the generic dialect there with the proper quote style and then do semantic transformations (like different names for functions) in an ast_analyzer implementation.

@backkem
Copy link
Collaborator

backkem commented Oct 12, 2024

RE where to land things: it's a good question. Right now I think we have roughly:

  • Unparser: pure plan to SQL logic. It knows about some syntactic differences but not engine featureset.
  • datafusion-federation: framework to split plans across one or more remote query engines by sending entire sub-plans to the remote (can be SQL but could also be Substrait or other formats). The SQLExecutor assumes the remote is a 'full' SQL engine. Alternatively implementers of the FederationProvider can provide an Optimizer that more selectively cuts out a sub-plan.
  • datafusion-table-providers: table providers that function with or without federation. Currently most remote/engine specific stuff is here. I think this makes sense if you want the table providers to work both with or without federation.

Things can evolve but this is the balance that emerged so far. I'm happy to host more plan re-writing tools/logic that is federation related in this repo.

@jonmmease
Copy link

Thanks for that context @backkem, I appreciate it and it's very helpful. As I've thought about it, it does make sense for the semantic customization of dialects to live alongside the table provider implementations for the sake of testing the generated SQL.

@phillipleblanc
Copy link
Collaborator Author

phillipleblanc commented Oct 12, 2024

Yeah, I still think its not 100% clear where the responsibility of the Unparser dialect ends and where something like the AST rewriter in the specific implementation begins. Like @backkem, I tend to think that the core Unparser should mostly be focused on syntactical differences between the dialects whereas more complex differences in how, e.g., functions work can be implemented by the AST rewriter (as we did for handling intervals in SQLite: datafusion-contrib/datafusion-table-providers#85)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants