Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

full-refresh on insert_overwrite strategy #211

Open
ghost opened this issue Aug 16, 2023 · 2 comments
Open

full-refresh on insert_overwrite strategy #211

ghost opened this issue Aug 16, 2023 · 2 comments
Labels
enhancement New feature or request

Comments

@ghost
Copy link

ghost commented Aug 16, 2023

Describe the feature

When create a model with strategy insert_overwrite, the --full-refresh seems not to work.

Describe alternatives you've considered

From the code it seems full fresh only supports view in incremental

https://github.com/aws-samples/dbt-glue/blob/main/dbt/include/glue/macros/materializations/incremental/incremental.sql#L63

Additional context

Step to reproduce:

  1. Create a model and build the model containing 25 columns with dbt run
  2. Add a new column to the model and build dbt run again:

There are errors like below

`xxx`.`xxxx` requires that the data to be inserted have the same number of columns as the target table: target table has 25 column(s) but the inserted data has 26 column(s), including 0 partition column(s) having constant value(s).

Who will this benefit?

When the schema change and users want to refresh the table, --full-refresh will be helpful.

Are you interested in contributing this feature?

Yes.

@ghost ghost added the enhancement New feature or request label Aug 16, 2023
@ghost
Copy link
Author

ghost commented Aug 16, 2023

My understanding is that in the original dbt with full fresh, it backup and rename the old table, build the table for the model, and drop the backup old table.

https://github.com/dbt-labs/dbt-core/blob/main/core/dbt/include/global_project/macros/materializations/models/incremental/incremental.sql#L64-L68

However, as dbt-glue (and dbt-spark) is using external table. Even we can rename the table, the data stays in the same location. So the backup doesn't make sense.

How about just drop the table only when full-fresh ?

@HaykManukyanAvetiky
Copy link

Is there any update on this ? I am facing hard time to use Hudi with full refresh, this functionality is very important. It will allow to use Hudi indexes also for full load tables and increase performance

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant