Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rows_affected not returned by adapter #351

Open
mvdwielen opened this issue May 23, 2023 · 5 comments · May be fixed by #883
Open

rows_affected not returned by adapter #351

mvdwielen opened this issue May 23, 2023 · 5 comments · May be fixed by #883
Labels
enhancement New feature or request help wanted Extra attention is needed Stale

Comments

@mvdwielen
Copy link

I am using the dbt artifacts package for monitoring ETL runs, dbt artifacts supports logging the number of rows affected but requires the adapter to return this information, currently it seems that the dbt-databricks adapter doesn't return this info in the result object hence the column is empty.

Is it possible to return this information for all materializations where data is persisted (atleast the table and incremental materializations?)

@mvdwielen mvdwielen added the enhancement New feature or request label May 23, 2023
@mvdwielen
Copy link
Author

What would be the timeline for this feature to become available. Having insights in the rows_affected on persisted tables (materialization = table or incremental) is key to:
-understand if the result of the model doesn't accidentally load incorrect/duplicated data due to incorrect logic in the model.
-what data volume is processed and to identify growth in data volume over time

A possible solution direction (but maybe there is a smarter, more efficient and secure wat to accomplish)?

Databricks creates delta lake tables by default, delta lake tables preserve the history of CRUD operations executed on the table.

For a table materialization the number of rows affected can be fetched by performing the following steps:
DESCRIBE HISTORY schema.table_name
Get the last record where column operation = 'DESCRIBE HISTORY x239_int.campaign'
Get the number of rows written from column operationMetrics -> numOfOutputRows

image

For an incremental materialization the number of rows affected can be retrieved the same way as for a table materialization but the the column operation should be equal to WRITE or MERGE depending if the incremental strategy is append, insert-overwrite or merge. The number of rows affected can be retrieved from the operationMetrics column as well.

The logic above should be executed directly after the execution of the SQL or Python code defined in the DBT model.

@gpodevijn
Copy link

Hi,

Is there any chance that this feature gets planned in upcoming releases?

Thanks!

@benc-db
Copy link
Collaborator

benc-db commented Sep 13, 2023

We are taking it into consideration for planning. If you would like to expedite, consider submitting a PR with an implementation :).

@gpodevijn
Copy link

Hi @benc-db. Thanks for your feedback. Sorry if my comment sounded as a request -- it was not. Should I need it much sooner, I will definitely look into contributing! And thank you for the adapter, it is very useful.

@benc-db benc-db added the help wanted Extra attention is needed label Oct 3, 2023
Copy link

github-actions bot commented Apr 1, 2024

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please remove the stale label or comment on the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed Stale
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants