Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error stopping k8s container on containerd #7952

Open
mook-as opened this issue Dec 16, 2024 · 8 comments
Open

Error stopping k8s container on containerd #7952

mook-as opened this issue Dec 16, 2024 · 8 comments
Labels
area/kubernetes k8s and related, like traefik kind/bug Something isn't working qase Issue is related to manual test in Qase runtime/containerd

Comments

@mook-as
Copy link
Contributor

mook-as commented Dec 16, 2024

Actual Behavior

When using containerd container engine, attempting to stop a container corresponding to a pod displays an error:
image
The same occurs trying to do so manually:

> nerdctl -n k8s.io stop 12dbee4b591a
FATA[0000] 1 errors:
unable to cleanup network for container: 12dbee4b591a

Steps to Reproduce

  1. Start Rancher Desktop with containerd backend with Kubernetes enabled.
  2. Create a pod:

    kubectl create deployment nginx-test --image=nginx:stable

  3. Open the Rancher Desktop main window and navigate to the Containers tab.
  4. In the Namespace drop down near the top right, select k8s.io as the namespace.
  5. Locate the nginx container (not the pause one), and click on the ⋮ button on the right side.
  6. Click on _Stop`.

Result

See the Actual Behaviour section.

Expected Behavior

The container should be stopped. (Kubernetes may end up restarting it.)

Additional Information

Found while testing Qase test case RD-185.

Rancher Desktop Version

1.17.0-hackweek-release-254-g77398647d (1.17.0-RC1)

Rancher Desktop K8s Version

1.31.3

Which container engine are you using?

containerd (nerdctl)

What operating system are you using?

Windows

Operating System / Build Version

Windows 11 Pro 23H2 (Build 22631.4602)

What CPU architecture are you using?

x64

Linux only: what package format did you use to install Rancher Desktop?

None

Windows User Only

No response

@mook-as mook-as added kind/bug Something isn't working runtime/containerd area/kubernetes k8s and related, like traefik labels Dec 16, 2024
@jandubois
Copy link
Member

This might have the same root cause as containerd/nerdctl#3765

@apostasie
Copy link

This might have the same root cause as containerd/nerdctl#3765

Almost certainly.

@apostasie
Copy link

This might have the same root cause as containerd/nerdctl#3765

Confirming the fix here: containerd/nerdctl#3771 does address it in this case would be lovely.

@mook-as mook-as added the qase Issue is related to manual test in Qase label Dec 17, 2024
@jandubois
Copy link
Member

Testing with nerdctl version 2.0.2-20-gadfa1760 still doesn't work:

time="2024-12-17T11:08:01-08:00" level=warning msg="Unable to read network annotation: this container was probably not started with nerdctl.No networking cleanup will be performed, which may likely result in a broken state for the other systems you used to manage these containers.Mixing completely different stacks to manage containers lifecycle is not recommended." error="unexpected end of JSON input"

@apostasie
Copy link

apostasie commented Dec 17, 2024

Testing with nerdctl version 2.0.2-20-gadfa1760 still doesn't work:

time="2024-12-17T11:08:01-08:00" level=warning msg="Unable to read network annotation: this container was probably not started with nerdctl.No networking cleanup will be performed, which may likely result in a broken state for the other systems you used to manage these containers.Mixing completely different stacks to manage containers lifecycle is not recommended." error="unexpected end of JSON input"

@jandubois

This is no longer a hard error. It is now a warning (message is subject to change), and the kill (or stop) should proceed. Can you confirm / infirm?

Thanks!

@jandubois
Copy link
Member

jandubois commented Dec 17, 2024

This is no longer a hard error. It is now a warning (message is subject to change), and the kill (or stop) should proceed.

It may be an issue with the calling code, but I continue to get an error dialog, so it still is a regression from earlier releases:

CleanShot 2024-12-17 at 13 01 32@2x

When I run it from a terminal, the stop seems to work, but the warning is definitely concerning, especially the "unexpected end of JSON input" part:

$ nerdctl -n k8s.io stop 08f71719c7c6
time="2024-12-17T13:16:46-08:00" level=warning msg="Unable to read network annotation: this container was probably not started with nerdctl.No networking cleanup will be performed, which may likely result in a broken state for the other systems you used to manage these containers.Mixing completely different stacks to manage containers lifecycle is not recommended." error="unexpected end of JSON input"
08f71719c7c6

But once stopped, I still cannot remove the container:

$ nerdctl -n k8s.io rm 08f71719c7c6
FATA[0000] 1 errors:
failed to load container networking options from specs: unexpected end of JSON input
Error: exit status 1

$ nerdctl -n k8s.io rm -f 08f71719c7c6
ERRO[0000] 1 errors:
failed to load container networking options from specs: unexpected end of JSON input

@jandubois
Copy link
Member

@apostasie I think it would be best if nerdctl could take an extra option on stop, kill, rm etc commands that says: "I know you did not create these containers, but please clean them up for me anyways". That way you can get rid of the ugly warning as well, or turn them into real errors when the user didn't provide the --i-know-what-i-am-asking option.

I have no serious suggestions for the option name, unfortunately. --allow-foreign-container or something like that?

On a further note (and this should be discussed in the nerdctl repo), but should nerdctl ps list containers that is cannot manipulate? Or should they be filtered out too, unless you specify the --allow-foreign-containers option? Otherwise it seems a bit inconsistent.

@apostasie
Copy link

@jandubois moved the nerdctl part of the conversation to the PR.

As for:

It may be an issue with the calling code, but I continue to get an error dialog, so it still is a regression from earlier releases:

I am not familiar with rancher, so, I cannot advise.
It feels bizarre though that a simple warning on stderr would pop a dialog - you might want to look into that here (we might also just downgrade that to an info, so...)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/kubernetes k8s and related, like traefik kind/bug Something isn't working qase Issue is related to manual test in Qase runtime/containerd
Projects
None yet
Development

No branches or pull requests

3 participants