Replies: 2 comments
-
Looks like your long running task takes all the memory - remaining processes start swapping out and slow down to a crawl and they start to compete for swap. Look for memory leak on whatever task you have or but more memory if your task need more |
Beta Was this translation helpful? Give feedback.
0 replies
-
You can use memray on python task to track down memory usage if you need |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Task to solve with Airflow
Provide a pipeline that is triggered by external events (through the Airflow REST-API), then a main simulation task is executed taking 1–5 hours. Afterward a number of post-processing steps are performed on the simulation results and in a final task the results are uploaded to a database. The pipeline will be triggered several times per hour. It should be as standard as possible to allow for operation and maintenance by non-simulation domain staff, and the post-processing tasks should be flexible to be changed without bothering about the simulation task details. The number of workers should be easy to change, possibly using multiple machines.
For me,
Airflow
withCelery
seems to fulfill all these requirements, and therefore I am trying to set up a demonstration system.Setup
Airflow==2.10.3
withCeleryExecutor
docker-compose
stack (from here: https://airflow.apache.org/docs/apache-airflow/stable/howto/docker-compose/index.html#fetching-docker-compose-yaml)Problem
The problem I experience is as follows:
triggerer
container suddenly logs a lot of messages like this:INFO - Triggerer's async thread was blocked for 0.27 seconds, likely a badly-written trigger.
celery
andairflow
processes that take a CPU high load, the actual worker process only has a low CPU load and continues extremely slowly, also the webpage is very sluggishDAG to reproduce the problem:
This is a mock task reproducing the behavior of the real-world simulation code in terms of memory (here 8 GB, but it also happens with less) and CPU-consumption.
Questions
I am happy to provide more details if required, but I think that this DAG should be reproducible with the standard Docker Compose stack.
Memory usage of the machine
The machine is not out of memory when problems start:
Beta Was this translation helpful? Give feedback.
All reactions