You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'd like to build SCD type 2 with hudi. When use hudi and merge strategy for incremental model and the unique_key is composite key, duplicate records are created. I’m not sure it's a bug from dbt-glue, or hudi from Glue interactive session, or hudi, but it seems that _hoodie_record_key is incorrect when build the model for the first time.
Steps To Reproduce
create a model
{{ config(
materialized='incremental',
incremental_strategy='merge',
unique_key='id,expiry_date',
file_format='hudi',
hudi_options={
'hoodie.datasource.write.precombine.field': 'ingestion_date',
}
) }}
WITH source_data AS (
SELECT id, title, ingestion_date
FROM VALUES
('10000', 'Bread 123', '2023-08-01')
AS data(id, title, ingestion_date)
)
select id, title, ingestion_date, '9999-12-31' as expiry_date from source_data
B uild the model, and table is shown as below from Athena
Add a new record to the data, and build the model:
WITH source_data AS (
SELECT id, title, ingestion_date
FROM VALUES
('10010', 'Bread 123', '2023-08-01')
AS data(id, title, ingestion_date)
)
select id, title, ingestion_date, '9999-12-31' as expiry_date from source_data
The second option was necessary to prevent this error from occuring: : java.io.IOException: Could not load key generator class org.apache.hudi.keygen.ComplexKeyGenerator
Describe the bug
I'd like to build SCD type 2 with hudi. When use hudi and merge strategy for incremental model and the
unique_key
is composite key, duplicate records are created. I’m not sure it's a bug from dbt-glue, or hudi from Glue interactive session, or hudi, but it seems that_hoodie_record_key
is incorrect when build the model for the first time.Steps To Reproduce
Note that the _hoodie_record_key is
10000,9999-12-31
.Note that the _hoodie_record_key from the 2nd record is
[10000,9999-12-31](id:10000,expiry_date:9999-12-31)
.There is no difference from the previous step.
The result is as below:
The new record's
_hoodie_record_key
isid:10010,expiry_date:9999-12-31
. If rebuild the model again, there are no duplicate to be created.So I assume the
_hoodie_record_key
is incorrect when create the table.Expected behavior
There should be a single recard as the primary composite keys are the same (
(id, expiry_date)
.Screenshots and log output
N/A
System information
The output of
dbt --version
:The operating system you're using:
Ubuntu
The output of
python --version
:Python 3.8.16
Additional context
Config in
profiles.yaml
:The text was updated successfully, but these errors were encountered: