Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding minimal write option when creating table with Delta format #416

Merged

Conversation

Jeremynadal33
Copy link
Contributor

@Jeremynadal33 Jeremynadal33 commented Aug 5, 2024

resolves part of #415

Description

Add write option for creating table with file format Delta when using incremental strategy. For example, it allows --full-refresh when changing schema by adding a config : delta_create_table_write_options = {"mergeSchema": "true"} to your model.

Checklist

  • I have signed the CLA
  • I have run this code in development and it appears to resolve the stated issue
  • This PR includes tests, or tests are not required/relevant for this PR
  • I have updated the CHANGELOG.md and added information about my change to the "dbt-glue next" section.

Credits : Nicolas Fourmann initially found this solution !

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

FYI : I am not sure wether this change needs testing, if it does, I would be interested in understanding how to!

Test that were run locally

Simple model with minimal incremental merge strategy :

{{
    config(
        materialized='incremental',
        incremental_strategy='merge',
        unique_key=['id'],
        file_format='delta',
    )
}}

with incoming_data as (
    select 1 as id
    union all
    select 2 as id
)

select * from incoming_data

’’’
No problem at first run but when adding a column and running :

dbt run -s test_change_schema --full-refresh

Got an error from the glue adapter :

08:23:55  Glue adapter: Glue returned `error` for statement None for code 

spark.sql("""


with incoming_data as (
    select 1 as id, 'a' as new_col
    union all
    select 2 as id, 'a' as new_col
)

select * from incoming_data
""").write.format("delta").mode("overwrite").save("s3://xxx/xxx/test_change_schema")
SqlWrapper2.execute("""select 1""")
, AnalysisException: A schema mismatch detected when writing to the Delta table (Table ID: xxx).
To enable schema migration using DataFrameWriter or DataStreamWriter, please set:
'.option("mergeSchema", "true")'.
For other operations, set the session configuration
spark.databricks.delta.schema.autoMerge.enabled to "true". See the documentation
specific to the operation for details.

Table schema:
root
-- id: integer (nullable = true)


Data schema:
root
-- id: integer (nullable = true)
-- new_col: string (nullable = true)

         
To overwrite your schema or change partitioning, please set:
'.option("overwriteSchema", "true")'.

Note that the schema can't be overwritten when using
'replaceWhere'.

Now adding the config and the model looks like :

{{
    config(
        materialized='incremental',
        incremental_strategy='merge',
        unique_key=['id'],
        write_options = {"mergeSchema": "true"},
        file_format='delta',
    )
}}

with incoming_data as (
    select 1 as id, 'a' as new_col
    union all
    select 2 as id, 'a' as new_col
)

select * from incoming_data

And the same command passes gracefully and my table has its schema updated

@Jeremynadal33 Jeremynadal33 force-pushed the feature/add-delta-write-option branch from b806ef6 to 829de45 Compare August 6, 2024 07:50
@moomindani moomindani self-assigned this Sep 6, 2024
@moomindani moomindani added the enable-functional-tests This label enable functional tests label Sep 6, 2024
@moomindani moomindani merged commit e2f30ab into aws-samples:main Sep 12, 2024
5 checks passed
@moomindani
Copy link
Collaborator

Thank you for your contribution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
beginning-contributor enable-functional-tests This label enable functional tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants