-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
K8s Pod stuck at terminating #11308
Comments
@frezbo can you share just the pod.yaml for reproducer? It can be hard to deduce it from the output of |
Some failures mentioned in the containerd logs:
|
There are two containers here:
@fvoznika Do you know what could be going on?
It is almost as if the pause container's shim is confused about the container it is dealing with. Or that the nginx container's shim is being invoked even for the pause container. |
@frezbo Could you try an older runsc binary? Maybe a few months old. When did you first start experiencing this issue? If this is a breakage in runsc, then we should be able to bisect which commit introduced this regression. |
@frezbo : It looks like the issue is using containerd v2 which is incompatible with the cri v1 interface. See: https://github.com/containerd/containerd/blob/main/docs/containerd-2.0.md#whats-breaking Unfortunately, this means we have to write a new shim for the v2 cri interface. But the good news is we have to do that for GKE, so it will be on our roadmap. In the meantime, consider using containerd 1.7. Also, we ran into this issue which has similar side effects, so we'd reccomend a runsc binary ~ 3 months old or one that includes this patch: 6cf66fa |
We've had been trying different versions of gvisor from June od 2024, siderolabs/extensions#417 (comment), this started when we started using containerd v2 |
Ahh I see, so new gvisor and old runsc binary, since I built with a commit after |
I think Zach is referreing to #9834, which was a separate issue, not related to containerd v2. But had similar impact - Pod gets stuck when you try to delete them. Hence it is advisable to use runsc/gVisor build after 6cf66fa which mitigates that bug. containerd v2 support in gVisor is separate. @milantracy is looking into it. |
Description
Created as per request from here: #9834 (comment)
Logs attached here: runsc.tar.gz
Containerd logs: containerd.log
Steps to reproduce
Created runtimeclass"
deploy and pod
after pod is running, try to delete:
pod is stuck at terminating
runsc version
runsc --version runsc version VERSION_MISSING spec: 1.1.0-rc.1 Built from commit at `go` branch at commit here: 5477640
docker version (if using docker)
uname
Linux talos-default-worker-1 6.12.5-talos #1 SMP Mon Dec 16 13:06:45 UTC 2024 x86_64 Linux
kubectl (if using Kubernetes)
repo state (if built from source)
No response
runsc debug logs (if available)
Logs attached to main comment, since cannot attach here
CRI config:
runsc toml
The text was updated successfully, but these errors were encountered: