Only retry KubernetesPodOperator if the pod scheduling fails #44390
Unanswered
namelessjon
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi
I have a bunch of jobs using the KubernetesPodOperator (via EKS, but that doesn't seem relevant to this discussion). Pod scheduling in our cluster is a little unreliable, so I set retries for the operator via airflow to avoid transient failures. So far, so good.
However, independently of this, the jobs running in the pods sometimes fail (generally due to bad inputs somewhere upstream, but regardless of the reason, these failures are very unlikely to resolve via trying again). Due to the retries policy, they will still retry, which delays notifications of errors and results in pointless effort in running the task.
How can I configure things so that if the pod fails to schedule it will be retried, but if the task runs to completion, a non-zero exit code will be treated as if an
AirflowFailException
was raised?Suggestions gratefully accepted!
EDIT: from some investigation, it seems the handling of the exit code all happens in the cleanup method, with no real option to interfere. But is that correct?
Beta Was this translation helpful? Give feedback.
All reactions