You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks for raising this @wallyqs. We have a plan for how to proceed. It involves detecting OOM killed containers from the controller and cancelling them on Buildkite. We'll let you know when this is implemented. Let us know if there are more things to clean up for OOM killed jobs that we should catch as well.
Thanks for raising this @wallyqs. We have a plan for how to proceed. It involves detecting OOM killed containers from the controller and cancelling them on Buildkite. We'll let you know when this is implemented. Let us know if there are more things to clean up for OOM killed jobs that we should catch as well.
probably clean up the pod and job as well in the cluster
For example, a
container-0
job ran into this OOM so it has already exited:But in the buildkite UI it still appears running:
Maybe need a way for the controller to detect OOM events in the jobs to clean them up?
The text was updated successfully, but these errors were encountered: