Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cluster outage after patch update 1.28.14 -> 1.28.15 #4750

Open
slapcat opened this issue Nov 20, 2024 · 0 comments
Open

Cluster outage after patch update 1.28.14 -> 1.28.15 #4750

slapcat opened this issue Nov 20, 2024 · 0 comments

Comments

@slapcat
Copy link

slapcat commented Nov 20, 2024

Summary

On November 9 we experienced an outage on Charmed Microk8s 1.28/stable when the snap automatically refreshed from 1.28.14 to 1.28.15. Several of the kube-system pods restarted and failed to come up, either in pending or crashloopbackoff. These were dispersed across all 3 nodes in the cluster, so it was not just one with the issue. We also noticed all the calico pods were deleted and recreated (not just restarted). They also failed to come up, which could be what caused issues with the other pods.

What Should Happen Instead?

Patch updates should not cause service disruptions.

Reproduction Steps

Cannot reproduce since this was an automatic snap update.

Introspection Report

We did not collect one before resolving the incident, but this is the journal log from around the time of the refresh (17:07): https://pastebin.canonical.com/p/rV43qDMPw2/

After the refresh, I see many "task not found" errors like these:

Nov 09 17:36:47 microk8s-1 microk8s.daemon-kubelite[1826546]: E1109 23:36:47.015909 1826546 manager.go:1106] Failed to create existing container: /kubepods/besteffort/pod376b187f-f1cb-426d-b39d-130687311b1d/a7d239e3b3b5933c4e5cee3a21da35ea653acbbc8c4e604dd9852309cd89e508: task a7d239e3b3b5933c4e5cee3a21da35ea653acbbc8c4e604dd9852309cd89e508 not found: not found
Nov 09 17:36:50 microk8s-1 microk8s.daemon-kubelite[1826546]: E1109 23:36:50.128747 1826546 manager.go:1106] Failed to create existing container: /kubepods/besteffort/pod376b187f-f1cb-426d-b39d-130687311b1d/84ab7836363e64c15484a9eda4e0aad4fdcaff7b4b08d42a1fe161e43631a5a3: task 84ab7836363e64c15484a9eda4e0aad4fdcaff7b4b08d42a1fe161e43631a5a3 not found: not found

Can you suggest a fix?

Rebooting the nodes one-by-one resolved the issue.

Are you interested in contributing with a fix?

@ktsakalozos

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant