You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
dlt creates a new working folder in a /tmp for each (1) dag creation (2) each actual dag execution. in case of (2) we clean the working directory after run (see _run) method of the task group
in case of (1) we do not do it, ie. in this snippet
@dag(schedule_interval='@daily',start_date=pendulum.datetime(2023, 7, 1),catchup=False,max_active_runs=1,default_args=default_task_args)defload_data():
# set `use_data_folder` to True to store temporary data on the `data` bucket. Use only when it does not fit on the local storagetasks=PipelineTasksGroup("pipeline_decomposed", use_data_folder=False, wipe_local_data=True)
# import your source from pipeline scriptfromgithubimportgithub_repo_eventssource=github_repo_events("apache", "airflow")
# modify the pipeline parameters pipeline=dlt.pipeline(pipeline_name='pipeline_name',
dataset_name='dataset_name',
destination='duckdb',
full_refresh=False# must be false if we decompose
)
# create the source, the "serialize" decompose option will converts dlt resources into Airflow tasks. use "none" to disable ittasks.add_run(pipeline, source, decompose="serialize", trigger_rule="all_done", retries=0, provide_context=True)
pipeline creates a temp working dir (just to create a DAG), the code ends and the working dir is left.
Expected behavior
dlt will clean the working folder from the DAG creation, observing the wipe_local_data=True of PipelineTasksGroup.
In order to fix it we could convert our task group into a context manager:
dlt version
0.4.12
Describe the problem
dlt
creates a new working folder in a/tmp
for each (1) dag creation (2) each actual dag execution. in case of (2) we clean the working directory after run (see_run
) method of the task groupin case of (1) we do not do it, ie. in this snippet
pipeline
creates a temp working dir (just to create a DAG), the code ends and the working dir is left.Expected behavior
dlt
will clean the working folder from the DAG creation, observing thewipe_local_data=True
ofPipelineTasksGroup
.In order to fix it we could convert our task group into a context manager:
what happens here:
tasks.add_run
keeps the pipeline instance(s) so it can use them to wipe the folder__exit__
all working folders are wiped for all pipelines (flag permitting)Steps to reproduce
Please write a test that reproduces this behavior
Operating system
Linux
Runtime environment
Airflow
Python version
3.11
dlt data source
No response
dlt destination
No response
Other deployment details
No response
Additional information
No response
The text was updated successfully, but these errors were encountered: