Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add robust dry run capability for backfill #44395

Open
1 task done
dstandish opened this issue Nov 26, 2024 · 4 comments · May be fixed by #45062
Open
1 task done

Add robust dry run capability for backfill #44395

dstandish opened this issue Nov 26, 2024 · 4 comments · May be fixed by #45062
Assignees
Labels
area:API Airflow's REST/HTTP API area:backfill Specifically for backfill related kind:feature Feature Requests kind:meta High-level information important to the community

Comments

@dstandish
Copy link
Contributor

dstandish commented Nov 26, 2024

Body

Child of parent issue #43970

As a user, you want to be able to dry run the backfill creation process from the UI. E.g. i click "create backfill" and give it a range, then I want, in the UI, to be able to see the runs that will be created if I click "submit".

In order to do this, we'll have to refactor the backfill creation process a bit. Right now, we just submit a range, and the backfill endpoint will just create the backfill object and all of the runs.

One of the problems with the idea of implementing dry run is, suppose we return "these runs will be created; proceed?". Well what if the scheduler schedules, or a user clears or deletes, a run in the range. Then we would not end up doing exactly what we said we were going to do.

So what we need to do is somehow, implement in the API the ability to get some representation of the entirety of the backfill -- the object and its runs -- and then the user could submit that back to another endpoint which would just receive this payload and attempt to create it. In this second endpoint which is essentially "take the payload and create", we wolud first lock the dag and then attempt to insert all the rows. And if we find a conflict, we should abandon the whole try and tell the user, sorry, something changed, we got a conflict, please try again. There's a 409 Conflict API response that would seem to be appropriate here.

cc @phanikumv @jedcunningham @bbovenzi @pierrejeambrun

Committer

  • I acknowledge that I am a maintainer/committer of the Apache Airflow project.
@dstandish dstandish added area:API Airflow's REST/HTTP API kind:meta High-level information important to the community area:backfill Specifically for backfill related labels Nov 26, 2024
@dosubot dosubot bot added the kind:feature Feature Requests label Nov 26, 2024
@phanikumv phanikumv removed their assignment Nov 28, 2024
@vatsrahul1001 vatsrahul1001 self-assigned this Dec 3, 2024
@bbovenzi
Copy link
Contributor

bbovenzi commented Dec 3, 2024

This makes sense to me. I think it's very important for users to know exactly what they're about to change.

We can make sure the UI specifically handles the 409 response in the create backfill flow.

@vatsrahul1001
Copy link
Collaborator

Assigning to @prabhusneha

@prabhusneha
Copy link
Contributor

Assigning to @prabhusneha

I will take this up

@phanikumv phanikumv moved this from Todo to In Progress in AIP-78 backfills in scheduler Dec 10, 2024
@phanikumv phanikumv moved this from Todo to In Progress in AIP-84 MODERN REST API Dec 10, 2024
@prabhusneha
Copy link
Contributor

After further discussions with @dstandish, we decided to adopt an approach aligned with the current CLI dry run functionality, rather than implementing a two stage process. Specifically, when a user requests a dry run of the backfill, the response will include only the DAG runs that will actually be created. Any DAG runs that would not be created based on the specified reprocess_behavior will be excluded from the dry run response.

@prabhusneha prabhusneha linked a pull request Dec 18, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:API Airflow's REST/HTTP API area:backfill Specifically for backfill related kind:feature Feature Requests kind:meta High-level information important to the community
Projects
Status: In Progress
Development

Successfully merging a pull request may close this issue.

5 participants