Jobs that ran into OOM issues appear as still running in buildkite UI #182

wallyqs · 2023-07-14T13:36:41Z

For example, a container-0 job ran into this OOM so it has already exited:

 - containerID: containerd://868661c9da807af9428729518d1c95a52c1bb5efac68df8799cd6b24b475125c
    image: docker.io/library/golang:1.20-alpine
    imageID: docker.io/library/golang@sha256:59fc0dc542a38bb5b94cd1529e5f4663b4e7cc2f4a6c352b826dafe00d820031
    lastState: {}
    name: container-0
    ready: false
    restartCount: 0
    started: false
    state:
      terminated:
        containerID: containerd://868661c9da807af9428729518d1c95a52c1bb5efac68df8799cd6b24b475125c
        exitCode: 137
        finishedAt: "2023-07-14T10:40:18Z"
        reason: OOMKilled
        startedAt: "2023-07-14T10:31:49Z"

But in the buildkite UI it still appears running:

Maybe need a way for the controller to detect OOM events in the jobs to clean them up?

The text was updated successfully, but these errors were encountered:

triarius · 2023-07-19T03:07:01Z

Thanks for raising this @wallyqs. We have a plan for how to proceed. It involves detecting OOM killed containers from the controller and cancelling them on Buildkite. We'll let you know when this is implemented. Let us know if there are more things to clean up for OOM killed jobs that we should catch as well.

calvinbui · 2024-10-16T00:49:52Z

Thanks for raising this @wallyqs. We have a plan for how to proceed. It involves detecting OOM killed containers from the controller and cancelling them on Buildkite. We'll let you know when this is implemented. Let us know if there are more things to clean up for OOM killed jobs that we should catch as well.

probably clean up the pod and job as well in the cluster

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Jobs that ran into OOM issues appear as still running in buildkite UI #182

Jobs that ran into OOM issues appear as still running in buildkite UI #182

wallyqs commented Jul 14, 2023

triarius commented Jul 19, 2023

calvinbui commented Oct 16, 2024

Jobs that ran into OOM issues appear as still running in buildkite UI #182

Jobs that ran into OOM issues appear as still running in buildkite UI #182

Comments

wallyqs commented Jul 14, 2023

triarius commented Jul 19, 2023

calvinbui commented Oct 16, 2024