Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

merge_exclude_columns not working in iceberg incremental models #444

Open
e-quili opened this issue Sep 23, 2024 · 1 comment
Open

merge_exclude_columns not working in iceberg incremental models #444

e-quili opened this issue Sep 23, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@e-quili
Copy link

e-quili commented Sep 23, 2024

Describe the bug

The parameter merge_exclude_columns, which excludes the specified column from being updated with the merge strategy, seems to be ignored with iceberg tables.

Current behavior

The config below updates all the columns of the materialized table, including "created_at" column.

{{ 
    config(
        materialized='incremental',
        incremental_strategy='merge',
        unique_key='table_id',
        file_format= 'iceberg',
        merge_exclude_columns = ['created_at']
    )
}} 

Expected behavior

The setting merge_exclude_columns='created_at' should avoid updating the created_at column.

Screenshots and log output

We don't get any error message, even when we specify in merge_exclude_columns a column that doesn't exist.

System information

The output of dbt --version:

Core:
  - installed: 1.8.6
  - latest:    1.8.6 - Up to date!

Plugins:
  - glue:   1.8.1 - Up to date!
  - spark:  1.8.0 - Up to date!

The operating system you're using:
Windows

The output of python --version:
Python 3.12.6

Additional context

Same issue with merge_update_columns.

@e-quili e-quili added the bug Something isn't working label Sep 23, 2024
@showell-nex
Copy link

I've been having this problem too. I've dug through the code and the offending function is located here.

The iceberg upsert functionality in general needs building out but the actually buggy code is here whereby the current timestamp is added as an update timestamp BEFORE any comparisons are done between the source and the target data, causing the upsert string to always evaluate to not matched and insert new rows every time as the update timestamp column will always have a value in it that doesn't match any previously written rows.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants