Tasks fails without logs under heavy load #45078
Labels
area:core
area:logging
kind:bug
This is a clearly a bug
needs-triage
label for new issues that we didn't triage yet
provider:cncf-kubernetes
Kubernetes provider related issues
Apache Airflow version
2.10.4
If "Other Airflow 2 version" selected, which one?
No response
What happened?
I have multiple dag_run of a dag, running parallel on a kubernetes cluster with a single worker pod.
I use 16 as parallelism and a retry_count of 4.
This dags is composed of mapped_tasks. The bigger one spawns 36 mapped task.
Every day 100 dag_run will be spawned toghter and the dag_run with most task will fail with 3/4 mapped tasks failed.
Those tasks fails after 4 retry, but most of the times i see only 1 or 2 logs of execution.
Most of the time the log is :
This for example is
attempt=2.log
and i dont have 1,3 or 4. Neither in logs or in the UI.Then when I clear the state of failed tasks they will run correctly without errors.
What you think should happen instead?
I would like to see all the attempt, and a more clear trace of what happened so i can debug the problem.
How to reproduce
It's mostly dependent on the workload. On another istance with the same code, but less stress it doesn't happen.
Operating System
helm-chart on kubernetes
Versions of Apache Airflow Providers
No response
Deployment
Official Apache Airflow Helm Chart
Deployment details
Kubernetes on GKE
Anything else?
I dont have any error at kubernetes level or on the worker log
Are you willing to submit PR?
Code of Conduct
The text was updated successfully, but these errors were encountered: