Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add sqlglot to parse sql dataflow #3310

Merged
merged 16 commits into from
Dec 31, 2024

Conversation

Light2Dark
Copy link
Contributor

@Light2Dark Light2Dark commented Dec 30, 2024

📝 Summary

Fixes #3103. Adds sqlglot as an optional dependency to handle complex sql refs parsing. Also supports DML statements (insert, update and delete).

🔍 Description of Changes

  • set sqlglot version to lowest possible (likely >23.4, due to ibis lib)
  • If we're keen on supporting dml's, then I could add more tests (I added some here alrdy)
  • def find_sql_def() can also be modified to use sqlglot in the future, not included in this PR.
  • small change: updated duckdb version to 1.1.0 to be synchronised across all installations (some were 1.0 and some were 1.1)

📋 Checklist

  • I have read the contributor guidelines.
  • For large changes, or changes that affect the public API: this change was discussed or approved through an issue, on Discord, or the community discussions (Please provide a link if applicable).
  • I have added tests for the changes made.
  • I have run the code and verified that it works as expected.

📜 Reviewers

@akshayka OR @mscolnick

Copy link

vercel bot commented Dec 30, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
marimo-docs ✅ Ready (Inspect) Visit Preview 💬 Add feedback Dec 31, 2024 3:57am
marimo-storybook ✅ Ready (Inspect) Visit Preview 💬 Add feedback Dec 31, 2024 3:57am

@Light2Dark Light2Dark changed the title feat: Add sqlglot to parse sql dataflow feat: add sqlglot to parse sql dataflow Dec 30, 2024
Copy link
Contributor

@akshayka akshayka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fantastic, thank you so much! I'm very excited by what a migration to sqlglot will unlock in the future, including support for other dialects.

Left one nit and one question.

pyproject.toml Outdated Show resolved Hide resolved
marimo/_ast/sql_visitor.py Show resolved Hide resolved
Copy link
Contributor

@akshayka akshayka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one comment, and then looks good to me. Thanks for this change!

marimo/_ast/sql_visitor.py Outdated Show resolved Hide resolved
Comment on lines 629 to 639
@staticmethod
def test_find_sql_refs_with_recursive_cte() -> None:
sql = """
WITH RECURSIVE cte AS (
SELECT 1 AS n FROM table1
UNION ALL
SELECT n + 1 FROM cte WHERE n < 10
)
SELECT * FROM cte;
"""
assert find_sql_refs(sql) == ["table1"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is not yet supported, but you'd like to keep the test, you can decorate it with pytest.mark.xfail, as done here:

@pytest.mark.xfail(reason="Multiple CTEs are not supported")

Copy link
Contributor Author

@Light2Dark Light2Dark Dec 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it does work, I didn't see the need for it since we have tested nested cte's. Also noted on the dialect change. I may add back the has_duckdb dependency too.

thanks!!

Comment on lines +731 to +733
def test_find_sql_refs_invalid_sql() -> None:
sql = "SELECT * FROM"
assert find_sql_refs(sql) == []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice test, thanks

Copy link
Contributor

@akshayka akshayka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fantastic work! This will have a large impact on marimo's SQL support; thanks so much for getting the ball rolling!

@akshayka akshayka merged commit be4d05c into marimo-team:main Dec 31, 2024
30 checks passed
Copy link

🚀 Development release published. You may be able to view the changes at https://marimo.app?v=0.10.10-dev1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

More comprehensive parsing of the SQL dataflow graph
3 participants