Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

K8s Pod stuck at terminating #11308

Closed
frezbo opened this issue Dec 18, 2024 · 8 comments
Closed

K8s Pod stuck at terminating #11308

frezbo opened this issue Dec 18, 2024 · 8 comments
Assignees
Labels
type: bug Something isn't working

Comments

@frezbo
Copy link

frezbo commented Dec 18, 2024

Description

Created as per request from here: #9834 (comment)

Logs attached here: runsc.tar.gz

Containerd logs: containerd.log

Steps to reproduce

Created runtimeclass"

❯ kubectl get runtimeclasses.node.k8s.io gvisor -o yaml
apiVersion: node.k8s.io/v1
handler: runsc
kind: RuntimeClass
metadata:
  creationTimestamp: "2024-12-18T14:04:44Z"
  name: gvisor
  resourceVersion: "584"
  uid: d856b78a-430d-4659-80e0-259974268c20

deploy and pod

❯ kubectl get pod nginx-gvisor -o yaml 
apiVersion: v1
kind: Pod
metadata:
  creationTimestamp: "2024-12-18T14:04:44Z"
  deletionGracePeriodSeconds: 30
  deletionTimestamp: "2024-12-18T14:05:26Z"
  name: nginx-gvisor
  namespace: default
  resourceVersion: "610"
  uid: 8a6a4e09-e449-485c-8bb9-9de4de3fd696
spec:
  containers:
  - image: nginx
    imagePullPolicy: Always
    name: nginx-gvisor
    resources: {}
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: kube-api-access-ww2fn
      readOnly: true
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  nodeName: talos-default-worker-1
  preemptionPolicy: PreemptLowerPriority
  priority: 0
  restartPolicy: Always
  runtimeClassName: gvisor
  schedulerName: default-scheduler
  securityContext: {}
  serviceAccount: default
  serviceAccountName: default
  terminationGracePeriodSeconds: 30
  tolerations:
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 300
  volumes:
  - name: kube-api-access-ww2fn
    projected:
      defaultMode: 420
      sources:
      - serviceAccountToken:
          expirationSeconds: 3607
          path: token
      - configMap:
          items:
          - key: ca.crt
            path: ca.crt
          name: kube-root-ca.crt
      - downwardAPI:
          items:
          - fieldRef:
              apiVersion: v1
              fieldPath: metadata.namespace
            path: namespace
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2024-12-18T14:04:56Z"
    status: "True"
    type: PodReadyToStartContainers
  - lastProbeTime: null
    lastTransitionTime: "2024-12-18T14:04:44Z"
    status: "True"
    type: Initialized
  - lastProbeTime: null
    lastTransitionTime: "2024-12-18T14:04:56Z"
    status: "True"
    type: Ready
  - lastProbeTime: null
    lastTransitionTime: "2024-12-18T14:04:56Z"
    status: "True"
    type: ContainersReady
  - lastProbeTime: null
    lastTransitionTime: "2024-12-18T14:04:44Z"
    status: "True"
    type: PodScheduled
  containerStatuses:
  - containerID: containerd://690733dbd1e1c06579bad8c04ac30f12ad1825daa0c15de02a17a4bc2d092a4a
    image: docker.io/library/nginx:latest
    imageID: docker.io/library/nginx@sha256:fb197595ebe76b9c0c14ab68159fd3c08bd067ec62300583543f0ebda353b5be
    lastState: {}
    name: nginx-gvisor
    ready: true
    restartCount: 0
    started: true
    state:
      running:
        startedAt: "2024-12-18T14:04:55Z"
    volumeMounts:
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: kube-api-access-ww2fn
      readOnly: true
      recursiveReadOnly: Disabled
  hostIP: 10.5.0.3
  hostIPs:
  - ip: 10.5.0.3
  phase: Running
  podIP: 10.244.1.4
  podIPs:
  - ip: 10.244.1.4
  qosClass: BestEffort
  startTime: "2024-12-18T14:04:44Z"

after pod is running, try to delete:

kubectl delete pod nginx-gvisor

pod is stuck at terminating

❯ kubectl describe pod nginx-gvisor
Name:                      nginx-gvisor
Namespace:                 default
Priority:                  0
Runtime Class Name:        gvisor
Service Account:           default
Node:                      talos-default-worker-1/10.5.0.3
Start Time:                Wed, 18 Dec 2024 19:34:44 +0530
Labels:                    <none>
Annotations:               <none>
Status:                    Terminating (lasts 99s)
Termination Grace Period:  30s
IP:                        10.244.1.4
IPs:
  IP:  10.244.1.4
Containers:
  nginx-gvisor:
    Container ID:   containerd://690733dbd1e1c06579bad8c04ac30f12ad1825daa0c15de02a17a4bc2d092a4a
    Image:          nginx
    Image ID:       docker.io/library/nginx@sha256:fb197595ebe76b9c0c14ab68159fd3c08bd067ec62300583543f0ebda353b5be
    Port:           <none>
    Host Port:      <none>
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Wed, 18 Dec 2024 19:34:55 +0530
      Finished:     Wed, 18 Dec 2024 19:34:58 +0530
    Ready:          False
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-ww2fn (ro)
Conditions:
  Type                        Status
  PodReadyToStartContainers   True 
  Initialized                 True 
  Ready                       False 
  ContainersReady             False 
  PodScheduled                True 
Volumes:
  kube-api-access-ww2fn:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason         Age    From               Message
  ----     ------         ----   ----               -------
  Normal   Scheduled      2m21s  default-scheduler  Successfully assigned default/nginx-gvisor to talos-default-worker-1
  Normal   Pulling        2m20s  kubelet            Pulling image "nginx"
  Normal   Pulled         2m10s  kubelet            Successfully pulled image "nginx" in 9.576s (9.576s including waiting). Image size: 72099501 bytes.
  Normal   Created        2m10s  kubelet            Created container: nginx-gvisor
  Normal   Started        2m10s  kubelet            Started container nginx-gvisor
  Normal   Killing        2m7s   kubelet            Stopping container nginx-gvisor
  Warning  FailedKillPod  6s     kubelet            error killing pod: failed to "KillPodSandbox" for "8a6a4e09-e449-485c-8bb9-9de4de3fd696" with KillPodSandboxError: "rpc error: code = DeadlineExceeded desc = context deadline exceeded"

runsc version

runsc --version
runsc version VERSION_MISSING
spec: 1.1.0-rc.1


Built from commit at `go` branch at commit here: 5477640

docker version (if using docker)

N/A

uname

Linux talos-default-worker-1 6.12.5-talos #1 SMP Mon Dec 16 13:06:45 UTC 2024 x86_64 Linux

kubectl (if using Kubernetes)

❯ kubectl version
Client Version: v1.32.0
Kustomize Version: v5.5.0
Server Version: v1.32.0



❯ kubectl get nodes
NAME                           STATUS   ROLES           AGE     VERSION
talos-default-controlplane-1   Ready    control-plane   6m47s   v1.32.0
talos-default-worker-1         Ready    <none>          6m47s   v1.32.0

repo state (if built from source)

No response

runsc debug logs (if available)

Logs attached to main comment, since cannot attach here

CRI config:

❯ talosctl -n 10.5.0.3 read /etc/cri/conf.d/cri.toml 
## /etc/cri/conf.d/00-base.part
## /etc/cri/conf.d/01-registries.part
## /etc/cri/conf.d/20-customization.part
## /etc/cri/conf.d/gvisor-debug.part
## /etc/cri/conf.d/gvisor-kvm.part
## /etc/cri/conf.d/gvisor.part

version = 3

[debug]
  level = 'debug'

[plugins]
  [plugins.'io.containerd.cri.v1.images']
    discard_unpacked_layers = true

    [plugins.'io.containerd.cri.v1.images'.registry]
      config_path = '/etc/cri/conf.d/hosts'

      [plugins.'io.containerd.cri.v1.images'.registry.configs]

  [plugins.'io.containerd.cri.v1.runtime']
    [plugins.'io.containerd.cri.v1.runtime'.containerd]
      [plugins.'io.containerd.cri.v1.runtime'.containerd.runtimes]
        [plugins.'io.containerd.cri.v1.runtime'.containerd.runtimes.runc]
          base_runtime_spec = '/etc/cri/conf.d/base-spec.json'

        [plugins.'io.containerd.cri.v1.runtime'.containerd.runtimes.runsc]
          runtime_type = 'io.containerd.runsc.v1'

          [plugins.'io.containerd.cri.v1.runtime'.containerd.runtimes.runsc.options]
            ConfigPath = '/etc/cri/conf.d/runsc.toml'
            TypeUrl = 'io.containerd.runsc.v1.options'

        [plugins.'io.containerd.cri.v1.runtime'.containerd.runtimes.runsc-kvm]
          runtime_type = 'io.containerd.runsc.v1'

          [plugins.'io.containerd.cri.v1.runtime'.containerd.runtimes.runsc-kvm.options]
            ConfigPath = '/etc/cri/conf.d/runsc-kvm.toml'
            TypeUrl = 'io.containerd.runsc.v1.options'

  [plugins.'io.containerd.runtime.v1.linux']
    shim_debug = true

runsc toml

❯ talosctl -n 10.5.0.3 read /etc/cri/conf.d/runsc.toml 
log_path = "/var/log/runsc/%ID%/shim.log"
log_level = "debug"

[runsc_config]
  debug = "true"
  debug-log = "/var/log/runsc/%ID%/gvisor.%COMMAND%.log"
@ayushr2
Copy link
Collaborator

ayushr2 commented Dec 18, 2024

@frezbo can you share just the pod.yaml for reproducer? It can be hard to deduce it from the output of kubectl get pod nginx-gvisor -o yaml.

@ayushr2
Copy link
Collaborator

ayushr2 commented Dec 18, 2024

Some failures mentioned in the containerd logs:

10.5.0.3: {"error":"cni config load failed: no network config found in /etc/cni/net.d: cni plugin not initialized: failed to load cni config","level":"error","msg":"failed to load cni during init, please check CRI plugin status before setting up network for pods","time":"2024-12-18T14:01:58.946096167Z"}
10.5.0.3: {"error":"introspecting OCI runtime features needs the runtime type to be \"io.containerd.runc.v2\", got \"io.containerd.runsc.v1\"","level":"debug","msg":"failed to introspect features of runtime \"runsc\"","time":"2024-12-18T14:01:58.946199801Z"}
10.5.0.3: {"error":"introspecting OCI runtime features needs the runtime type to be \"io.containerd.runc.v2\", got \"io.containerd.runsc.v1\"","level":"debug","msg":"failed to introspect features of runtime \"runsc-kvm\"","time":"2024-12-18T14:01:58.946225770Z"}
...
10.5.0.3: {"error":"rpc error: code = Unknown desc = failed to setup network for sandbox \"486a60c5bbacc5425f5c3cddbf015cb76746d8a9cacb40d310e041467823c149\": plugin type=\"flannel\" failed (add): loadFlannelSubnetEnv failed: open /run/flannel/subnet.env: no such file or directory","level":"error","msg":"RunPodSandbox for \u0026PodSandboxMetadata{Name:coredns-578d4f8ffc-chjsq,Uid:f3331665-79df-47d7-af27-4766dcb80f32,Namespace:kube-system,Attempt:0,} failed, error","time":"2024-12-18T14:03:28.197389045Z"}
...
10.5.0.3: {"error":"rpc error: code = DeadlineExceeded desc = failed to stop sandbox \"e4647f8332e28b91d31ad966753a76c313c6f9905524baa8c707b09c253a5fa9\": failed to stop sandbox container \"e4647f8332e28b91d31ad966753a76c313c6f9905524baa8c707b09c253a5fa9\" in \"SANDBOX_READY\" state: context deadline exceeded","level":"error","msg":"StopPodSandbox for \"e4647f8332e28b91d31ad966753a76c313c6f9905524baa8c707b09c253a5fa9\" failed","time":"2024-12-18T14:06:59.141335825Z"}

@ayushr2
Copy link
Collaborator

ayushr2 commented Dec 18, 2024

There are two containers here:

  • The pause container is e4647f8332e28b91d31ad966753a76c313c6f9905524baa8c707b09c253a5fa9. This is also the Sandbox ID. So above errors about failing to setup network are for another sandbox we are not investigating.
  • The nginx container is 690733dbd1e1c06579bad8c04ac30f12ad1825daa0c15de02a17a4bc2d092a4a .
  • The nginx container was killed and deleted successfully.
  • I don't see any runsc delete or runsc kill invocations for the pause container.
  • The shim logs do show that the shim got the request to kill the pause container, but it did not invoke runsc kill. This feels wrong:
time="2024-12-18T14:04:59.144521054Z" level=debug msg="Kill, id: e4647f8332e28b91d31ad966753a76c313c6f9905524baa8c707b09c253a5fa9, execID: , signal: 9, all: false"
time="2024-12-18T14:04:59.144608421Z" level=debug msg="Kill succeeded"
  • Right after that the shim logs that a request was made for the state of the pause container. But the response contains the state of the nginx container. This looks wrong:
time="2024-12-18T14:06:59.854209290Z" level=debug msg="State, id: e4647f8332e28b91d31ad966753a76c313c6f9905524baa8c707b09c253a5fa9, execID: "
time="2024-12-18T14:06:59.854367960Z" level=debug msg="State succeeded, response: &StateResponse{ID:690733dbd1e1c06579bad8c04ac30f12ad1825daa0c15de02a17a4bc2d092a4a,Bundle:/run/containerd/io.containerd.runtime.v2.task/k8s.io/690733dbd1e1c06579bad8c04ac30f12ad1825daa0c15de02a17a4bc2d092a4a,Pid:3330,Status:STOPPED,Stdin:,Stdout:/run/containerd/io.containerd.grpc.v1.cri/containers/690733dbd1e1c06579bad8c04ac30f12ad1825daa0c15de02a17a4bc2d092a4a/io/757237608/690733dbd1e1c06579bad8c04ac30f12ad1825daa0c15de02a17a4bc2d092a4a-stdout,Stderr:/run/containerd/io.containerd.grpc.v1.cri/containers/690733dbd1e1c06579bad8c04ac30f12ad1825daa0c15de02a17a4bc2d092a4a/io/757237608/690733dbd1e1c06579bad8c04ac30f12ad1825daa0c15de02a17a4bc2d092a4a-stderr,Terminal:false,ExitStatus:0,ExitedAt:2024-12-18 14:04:58.717778783 +0000 UTC m=+13.513741255,ExecID:,XXX_unrecognized:[],}"

@fvoznika Do you know what could be going on?

  • Why is the Kill shim operation on the pause container do nothing and just concluded that "Kill succeeded"?
  • Why did the State shim operation on the pause container return results for the nginx container.

It is almost as if the pause container's shim is confused about the container it is dealing with. Or that the nginx container's shim is being invoked even for the pause container.

@ayushr2
Copy link
Collaborator

ayushr2 commented Dec 18, 2024

@frezbo Could you try an older runsc binary? Maybe a few months old. When did you first start experiencing this issue? If this is a breakage in runsc, then we should be able to bisect which commit introduced this regression.

@milantracy milantracy self-assigned this Dec 19, 2024
@zkoopmans
Copy link
Contributor

@frezbo : It looks like the issue is using containerd v2 which is incompatible with the cri v1 interface.

See: https://github.com/containerd/containerd/blob/main/docs/containerd-2.0.md#whats-breaking

Unfortunately, this means we have to write a new shim for the v2 cri interface. But the good news is we have to do that for GKE, so it will be on our roadmap.

In the meantime, consider using containerd 1.7.

Also, we ran into this issue which has similar side effects, so we'd reccomend a runsc binary ~ 3 months old or one that includes this patch: 6cf66fa

@frezbo
Copy link
Author

frezbo commented Dec 19, 2024

@frezbo Could you try an older runsc binary? Maybe a few months old. When did you first start experiencing this issue? If this is a breakage in runsc, then we should be able to bisect which commit introduced this regression.

We've had been trying different versions of gvisor from June od 2024, siderolabs/extensions#417 (comment), this started when we started using containerd v2

@frezbo
Copy link
Author

frezbo commented Dec 19, 2024

Also, we ran into this issue which has similar side effects, so we'd reccomend a runsc binary ~ 3 months old or one that includes this patch: 6cf66fa

Ahh I see, so new gvisor and old runsc binary, since I built with a commit after 6cf66fa

@ayushr2
Copy link
Collaborator

ayushr2 commented Dec 19, 2024

I think Zach is referreing to #9834, which was a separate issue, not related to containerd v2. But had similar impact - Pod gets stuck when you try to delete them. Hence it is advisable to use runsc/gVisor build after 6cf66fa which mitigates that bug.

containerd v2 support in gVisor is separate. @milantracy is looking into it.

@ayushr2 ayushr2 closed this as completed Dec 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants