Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error: failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: exec: "<exec_name>": stat <exec_name>: no such file or directory: unknown #4741

Open
mikhatanu opened this issue Nov 7, 2024 · 5 comments

Comments

@mikhatanu
Copy link

mikhatanu commented Nov 7, 2024

Summary

Microk8s was running fine yesterday, until suddenly every pod either crashedloopbackiff or runcontainer error with similar error log in title:
Error: failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: exec: "<exec_name>": stat <exec_name>: no such file or directory: unknown,
with exec_name value according to pods application.

One of the pod events is:
Error: failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: exec: "/coredns": stat /coredns: no such file or directory: unknown

image

containerd-template.toml:

# Use config version 2 to enable new configuration fields.
version = 2
oom_score = 0

[grpc]
  uid = 0
  gid = 0
  max_recv_message_size = 16777216
  max_send_message_size = 16777216

[debug]
  address = ""
  uid = 0
  gid = 0

[metrics]
  address = "127.0.0.1:1338"
  grpc_histogram = false

[cgroup]
  path = ""

# The 'plugins."io.containerd.grpc.v1.cri"' table contains all of the server options.
[plugins."io.containerd.grpc.v1.cri"]

  stream_server_address = "127.0.0.1"
  stream_server_port = "0"
  enable_selinux = false
  sandbox_image = "registry.k8s.io/pause:3.9"
  stats_collect_period = 10
  enable_tls_streaming = false
  max_container_log_line_size = 16384

  # 'plugins."io.containerd.grpc.v1.cri".containerd' contains config related to containerd
  [plugins."io.containerd.grpc.v1.cri".containerd]

    # snapshotter is the snapshotter used by containerd.
    snapshotter = "overlayfs"

    # no_pivot disables pivot-root (linux only), required when running a container in a RamDisk with runc.
    # This only works for runtime type "io.containerd.runtime.v1.linux".
    no_pivot = false

    # default_runtime_name is the default runtime name to use.
    default_runtime_name = "runc"

    # 'plugins."io.containerd.grpc.v1.cri".containerd.runtimes' is a map from CRI RuntimeHandler strings, which specify types
    # of runtime configurations, to the matching configurations.
    # In this example, 'runc' is the RuntimeHandler string to match.
    [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
      # runtime_type is the runtime type to use in containerd e.g. io.containerd.runtime.v1.linux
      runtime_type = "io.containerd.runc.v1"

    [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia-container-runtime]
      # runtime_type is the runtime type to use in containerd e.g. io.containerd.runtime.v1.linux
      runtime_type = "io.containerd.runc.v1"

      [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia-container-runtime.options]
        BinaryName = "nvidia-container-runtime"

   [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.kata]
      runtime_type = "io.containerd.kata.v2"
      [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.kata.options]
        BinaryName = "kata-runtime"

  # 'plugins."io.containerd.grpc.v1.cri".cni' contains config related to cni
  [plugins."io.containerd.grpc.v1.cri".cni]
    # bin_dir is the directory in which the binaries for the plugin is kept.
    bin_dir = "/var/snap/microk8s/7229/opt/cni/bin"

    # conf_dir is the directory in which the admin places a CNI conf.
    conf_dir = "/var/snap/microk8s/7229/args/cni-network"

  # 'plugins."io.containerd.grpc.v1.cri".registry' contains config related to the registry
  [plugins."io.containerd.grpc.v1.cri".registry]
    config_path = "/var/snap/microk8s/7229/args/certs.d"

microk8s v1.31.1 revision 7229 (edit: upgrading to v1.31.2 rev7394 still not solving it)
containerd v1.6.28 (client and server)
calico v3.25.1

What Should Happen Instead?

Everything works normally

Reproduction Steps

microk8s stop
microk8s start

Introspection Report

inspection-report-20241107_162205.tar.gz

Can you suggest a fix?

Are you interested in contributing with a fix?

@mikhatanu
Copy link
Author

After chmod 777 on /mnt/<nfs_mount>, some of the pods start to work normally. Idk how this fixed some of them. There is some pod that is still error: Error: failed to create "containerd" task: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: exec:

@claudiubelu
Copy link
Contributor

Hello,

Hm, I'm seeing some other errors in your containerd logs as well, which are interesting:

Nov 07 15:49:04 devnode microk8s.daemon-containerd[43022]: time="2024-11-07T15:49:04.009457956+07:00" level=error msg="RunPodSandbox for &PodSandboxMetadata{Name:kai-beta-dms-worker-75965bbf87-vnpvw,Uid:060e839f-b0db-40ff-afd7-62021477dc61,Namespace:kai-beta-dms,Attempt:14,} failed, error" error="failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: exec: \"/pause\": stat /pause: no such file or directory: unknown"

That binary should be from the pause image. I see registry.k8s.io/pause:3.9" is set as the sandbox image (though, I see references to registry.k8s.io/pause:3.7" in the logs as well). It's a simple image with a simple binary, meant to just exist indefinitely, there shouldn't be missing, or have any dependency. There might be something going on on the host itself. Out of curiosity, what CPU architecture do you have? amd64, arm64, other? Have you run something that would "emulate" other platforms, like qemu or binfmt (https://github.com/tonistiigi/binfmt)?

Can you try spawning a simple container? Try the following (note that pause has no output, but ideally won't have any error; you can CTRL+C afterwards, you should see Shutting down, got signal: Interrupt):

microk8s ctr image pull registry.k8s.io/pause:3.9
microk8s ctr run registry.k8s.io/pause:3.9 foo

What about:

docker run --rm -ti registry.k8s.io/pause:3.9

@mikhatanu
Copy link
Author

mikhatanu commented Nov 19, 2024

Architecture is x86_64, using rhel 8.7 (ootpa)

Running microk8s ctr run registry.k8s.io/pause:3.9 foo returns shutting down, got signal: Interrupt.

This cluster previously encounter multiple error and fixed by other people. I'm currently fixing this error, so i don't really know what the other are doing to fix previous error. Also, the /var folder got chmod to 777 on every server restart. I also have tried to chmod -R 777 /var/snap/microk8s/common/var/lib/containerd and /var/snap/microk8s/common/run/ folder, but still same error.

pause is using version 3.7 originally, i changed it to 3.9 to fix some error.

@mikhatanu
Copy link
Author

Everything works fine now after some possible fix that might be fixing this:

  1. Disabling kaspersky endpoint agent
  2. Upgrading container image (image is fine in 1.25, but probably there's breaking change in >1.29)
  3. chmod -R 0755 /var
  4. Upgrading snap to 2.65.1-0.el8
  5. Refreshing microk8s certs

@mikhatanu
Copy link
Author

mikhatanu commented Dec 2, 2024

The error is back with similar message:

Error: failed to create containerd task: failed to create shim task, OCI runtime create failed: ruinc create failed: unable to start container propcess: exec "/opt/mendix/entrypoint": stat /opt/mendix/entrypoint: no such file or directory: unknown

edit: currently this happen due to image not rebuilding after one of the fix specified above. Will post more update after solving this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants