Only retry KubernetesPodOperator if the pod scheduling fails #44390

namelessjon · 2024-11-26T14:51:26Z

namelessjon
Nov 26, 2024

Hi

I have a bunch of jobs using the KubernetesPodOperator (via EKS, but that doesn't seem relevant to this discussion). Pod scheduling in our cluster is a little unreliable, so I set retries for the operator via airflow to avoid transient failures. So far, so good.

However, independently of this, the jobs running in the pods sometimes fail (generally due to bad inputs somewhere upstream, but regardless of the reason, these failures are very unlikely to resolve via trying again). Due to the retries policy, they will still retry, which delays notifications of errors and results in pointless effort in running the task.

How can I configure things so that if the pod fails to schedule it will be retried, but if the task runs to completion, a non-zero exit code will be treated as if an AirflowFailException was raised?

Suggestions gratefully accepted!

EDIT: from some investigation, it seems the handling of the exit code all happens in the cleanup method, with no real option to interfere. But is that correct?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Only retry KubernetesPodOperator if the pod scheduling fails #44390

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

Only retry KubernetesPodOperator if the pod scheduling fails #44390

namelessjon Nov 26, 2024

Replies: 0 comments

namelessjon
Nov 26, 2024