Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GitHub CI/CD Bot: Add option to run a data diff for models changed and those impacted downstream #3297

Open
sungchun12 opened this issue Oct 25, 2024 · 0 comments
Labels
Feature Adds new functionality

Comments

@sungchun12
Copy link
Contributor

Problem: A user makes changes to their models and they see how code changed, models impacted, and unit tests run. However, it's still opaque as to how exactly data has changed. It's hard to review the PR with confidence without running ad hoc queries to know the differences between dev and prod.

Solution: Add a config option in addition to a tabbed output in the GitHub Action formatted output that shows the data diff stats automatically and have it automatically diff by picking up the grain to run the data diff.

Example config:

cicd_bot:
    type: github
    merge_method: squash
    enable_deploy_command: true
    enable_table_diff: true # this is the one
    auto_categorize_changes:
      external: full
      python: full
      sql: full
      seed: full

Note: probably want to prevent showing sample data and give a warning or error to the user that this is not allowed in the official ci/cd bot for security reasons.

@sungchun12 sungchun12 added the Feature Adds new functionality label Oct 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature Adds new functionality
Projects
None yet
Development

No branches or pull requests

1 participant