You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Make SQLMesh smarter when running compile time checks by validating a column exists or not BEFORE running queries in the query engine.
Scenario: I have a SQL model below where I intentionally add a fake column that does not exist. The upstream model tcloud_demo.seed_raw_payments has columns defined explicitly. When I run a new plan, it will fail as expected at the query engine level. I want it to fail BEFORE that happens. This makes developing in SQLMesh much faster as users don't have to wait for the query engine to fix obvious errors, and it incentivizes them to explicitly define their columns.
Note: this will not solve for models defined as select * which is a fair expectation
(.venv) ➜ tobiko-cloud-demo git:(demo-sung) ✗ sqlmesh plan dev
======================================================================
Successfully Ran 1 tests against duckdb
----------------------------------------------------------------------
Summary of differences against `dev`:
Models:
├── Directly Modified:
│ └── tcloud_demo__dev.stg_payments
└── Indirectly Modified:
├── tcloud_demo__dev.orders
└── tcloud_demo__dev.customers
---
+++
@@ -16,5 +16,6 @@
order_id,
payment_method,
amount / 100 AS amount,
- 'example_column' AS example_column
+ 'example_column' AS example_column,
+ fake_column
FROM tcloud_demo.seed_raw_payments
Directly Modified: tcloud_demo__dev.stg_payments (Breaking)
└── Indirectly Modified Children:
├── tcloud_demo__dev.customers (Indirect Breaking)
└── tcloud_demo__dev.orders (Indirect Breaking)
Models needing backfill (missing dates):
├── tcloud_demo__dev.customers: 2024-10-30 - 2024-10-30
├── tcloud_demo__dev.orders: 2024-10-30 - 2024-10-30
└── tcloud_demo__dev.stg_payments: 2024-10-30 - 2024-10-30
Enter the backfill start date (eg. '1 year', '2020-01-01') or blank to backfill from the
beginning of history:
Enter the backfill end date (eg. '1 month ago', '2020-01-01') or blank to backfill up until'2024-10-31 00:00:00':
Apply - Backfill Tables [y/n]: y
Creating physical table ━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━━━━━━ 62.5% • 5/8 • 0:00:042024-10-31 13:45:24,191 - ThreadPoolExecutor-2_0 - sqlmesh.core.renderer - WARNING - Column '"fake_column"' could not be resolved for model '"sqlmesh-public-demo"."tcloud_demo"."stg_payments"', the column may not exist or is ambiguous (renderer.py:517)
Creating physical table ━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━━━━━━ 62.5% • 5/8 • 0:00:05
2024-10-31 13:45:25,427 - MainThread - sqlmesh.core.context - ERROR - Apply Failure: Traceback (most recent call last):
File "/Users/sung/Desktop/git_repos/tobiko-cloud-demo/.venv/lib/python3.12/site-packages/sqlmesh/utils/concurrency.py", line 69, in _process_node
self.fn(node)
File "/Users/sung/Desktop/git_repos/tobiko-cloud-demo/.venv/lib/python3.12/site-packages/sqlmesh/utils/concurrency.py", line 165, in<lambda>
lambda s_id: fn(snapshots_by_id[s_id]),
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/sung/Desktop/git_repos/tobiko-cloud-demo/.venv/lib/python3.12/site-packages/sqlmesh/core/snapshot/evaluator.py", line 300, in<lambda>
lambda s: self._create_snapshot(
^^^^^^^^^^^^^^^^^^^^^^
File "/Users/sung/Desktop/git_repos/tobiko-cloud-demo/.venv/lib/python3.12/site-packages/sqlmesh/core/snapshot/evaluator.py", line 716, in _create_snapshot
evaluation_strategy.create(
File "/Users/sung/Desktop/git_repos/tobiko-cloud-demo/.venv/lib/python3.12/site-packages/sqlmesh/core/snapshot/evaluator.py", line 1618, in create
self.adapter.create_view(
File "/Users/sung/Desktop/git_repos/tobiko-cloud-demo/.venv/lib/python3.12/site-packages/sqlmesh/core/engine_adapter/shared.py", line 302, in internal_wrapper
return func(*list_args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/sung/Desktop/git_repos/tobiko-cloud-demo/.venv/lib/python3.12/site-packages/sqlmesh/core/engine_adapter/base.py", line 997, in create_view
self.execute(
File "/Users/sung/Desktop/git_repos/tobiko-cloud-demo/.venv/lib/python3.12/site-packages/sqlmesh/core/engine_adapter/base.py", line 2019, in execute
self._execute(sql, **kwargs)
File "/Users/sung/Desktop/git_repos/tobiko-cloud-demo/.venv/lib/python3.12/site-packages/sqlmesh_enterprise/engine_adapter/bigquery.py", line 20, in _execute
super()._execute(sql, **kwargs)
File "/Users/sung/Desktop/git_repos/tobiko-cloud-demo/.venv/lib/python3.12/site-packages/sqlmesh/core/engine_adapter/bigquery.py", line 845, in _execute
results = self._db_call(
^^^^^^^^^^^^^^
File "/Users/sung/Desktop/git_repos/tobiko-cloud-demo/.venv/lib/python3.12/site-packages/sqlmesh/core/engine_adapter/bigquery.py", line 803, in _db_call
return func(
^^^^^
File "/Users/sung/Desktop/git_repos/tobiko-cloud-demo/.venv/lib/python3.12/site-packages/google/cloud/bigquery/job/query.py", line 1676, in result
while not is_job_done():
^^^^^^^^^^^^^
File "/Users/sung/Desktop/git_repos/tobiko-cloud-demo/.venv/lib/python3.12/site-packages/google/api_core/retry/retry_unary.py", line 293, in retry_wrapped_func
return retry_target(
^^^^^^^^^^^^^
File "/Users/sung/Desktop/git_repos/tobiko-cloud-demo/.venv/lib/python3.12/site-packages/google/api_core/retry/retry_unary.py", line 153, in retry_target
_retry_error_helper(
File "/Users/sung/Desktop/git_repos/tobiko-cloud-demo/.venv/lib/python3.12/site-packages/google/api_core/retry/retry_base.py", line 212, in _retry_error_helper
raise final_exc from source_exc
File "/Users/sung/Desktop/git_repos/tobiko-cloud-demo/.venv/lib/python3.12/site-packages/google/api_core/retry/retry_unary.py", line 144, in retry_target
result = target()
^^^^^^^^
File "/Users/sung/Desktop/git_repos/tobiko-cloud-demo/.venv/lib/python3.12/site-packages/google/cloud/bigquery/job/query.py", line 1625, in is_job_done
raise job_failed_exception
google.api_core.exceptions.BadRequest: 400 Unrecognized name: fake_column at [1:260]; reason: invalidQuery, location: query, message: Unrecognized name: fake_column at [1:260]
Location: US
Job ID: 425f534f-62ed-4617-8a66-f792dc44eade
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/Users/sung/Desktop/git_repos/tobiko-cloud-demo/.venv/lib/python3.12/site-packages/sqlmesh/core/context.py", line 1288, in apply
self._apply(plan, circuit_breaker)
File "/Users/sung/Desktop/git_repos/tobiko-cloud-demo/.venv/lib/python3.12/site-packages/sqlmesh_enterprise/context.py", line 98, in _apply
raise e
File "/Users/sung/Desktop/git_repos/tobiko-cloud-demo/.venv/lib/python3.12/site-packages/sqlmesh_enterprise/context.py", line 95, in _apply
super()._apply(plan, circuit_breaker)
File "/Users/sung/Desktop/git_repos/tobiko-cloud-demo/.venv/lib/python3.12/site-packages/sqlmesh/core/context.py", line 1866, in _apply
self._scheduler.create_plan_evaluator(self).evaluate(
File "/Users/sung/Desktop/git_repos/tobiko-cloud-demo/.venv/lib/python3.12/site-packages/sqlmesh/core/plan/evaluator.py", line 115, in evaluate
self._push(plan, snapshots, deployability_index_for_creation)
File "/Users/sung/Desktop/git_repos/tobiko-cloud-demo/.venv/lib/python3.12/site-packages/sqlmesh/core/plan/evaluator.py", line 211, in _push
self.snapshot_evaluator.create(
File "/Users/sung/Desktop/git_repos/tobiko-cloud-demo/.venv/lib/python3.12/site-packages/sqlmesh/core/snapshot/evaluator.py", line 298, in create
concurrent_apply_to_snapshots(
File "/Users/sung/Desktop/git_repos/tobiko-cloud-demo/.venv/lib/python3.12/site-packages/sqlmesh/utils/concurrency.py", line 163, in concurrent_apply_to_snapshots
return concurrent_apply_to_dag(
^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/sung/Desktop/git_repos/tobiko-cloud-demo/.venv/lib/python3.12/site-packages/sqlmesh/utils/concurrency.py", line 205, in concurrent_apply_to_dag
).run()
^^^^^
File "/Users/sung/Desktop/git_repos/tobiko-cloud-demo/.venv/lib/python3.12/site-packages/sqlmesh/utils/concurrency.py", line 64, in run
self._finished_future.result()
File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/concurrent/futures/_base.py", line 456, in result
returnself.__get_result()
^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
raise self._exception
sqlmesh.utils.concurrency.NodeExecutionFailedError: Execution failed for node SnapshotId<"sqlmesh-public-demo"."tcloud_demo"."stg_payments": 633964984>
(context.py:1296)
Error: Failed processing SnapshotId<"sqlmesh-public-demo"."tcloud_demo"."stg_payments": 633964984>. 400 Unrecognized name: fake_column at [1:260]; reason: invalidQuery, location: query, message: Unrecognized name: fake_column at [1:260]
Location: US
Job ID: 425f534f-62ed-4617-8a66-f792dc44eade
MODEL (
name tcloud_demo.stg_payments,
cron '@daily',
grain payment_id,
audits (UNIQUE_VALUES(columns = (
payment_id
)), NOT_NULL(columns = (
payment_id
)))
);
SELECT
id AS payment_id,
order_id,
payment_method,
amount /100AS amount, /* `amount` is currently stored in cents, so we convert it to dollars */'example_column'AS example_column, /* advanced change categorization example */-- 'new_column' AS new_column_from_sung, /* non-breaking change example */
fake_column
FROMtcloud_demo.seed_raw_payments
MODEL (
name tcloud_demo.seed_raw_payments,
kind SEED (
path'../seeds/raw_payments.csv'
),
columns (
id INT64,
order_id INT64,
payment_method STRING(50),
amount INT64
),
grain (id, user_id)
)
The text was updated successfully, but these errors were encountered:
sungchun12
changed the title
Better compile time checks if the column exists or not for both
Better compile time checks if the column exists or not
Oct 31, 2024
Make SQLMesh smarter when running compile time checks by validating a column exists or not BEFORE running queries in the query engine.
Scenario: I have a SQL model below where I intentionally add a fake column that does not exist. The upstream model
tcloud_demo.seed_raw_payments
has columns defined explicitly. When I run a new plan, it will fail as expected at the query engine level. I want it to fail BEFORE that happens. This makes developing in SQLMesh much faster as users don't have to wait for the query engine to fix obvious errors, and it incentivizes them to explicitly define their columns.Note: this will not solve for models defined as
select *
which is a fair expectationThe text was updated successfully, but these errors were encountered: