-
Notifications
You must be signed in to change notification settings - Fork 205
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kured says reboot not required even though there is a reboot-require file present on the kubernetes cluster linux node #787
Comments
Do we need to have cluster auto-upgrade enabled with node-image to make kured to work? |
Hi @deepaknani007, |
same problem in kube-adm deployed cluster with flatcar stable. time="2023-08-11T10:01:27Z" level=info msg="Binding node-id command flag to environment variable: KURED_NODE_ID" later i can see "reboot not required" |
Okay, thanks for this information. We will do a release later this month when kubernetes released its next minor. #806 will be included there which adds a warn-log for non -1 exit-codes for the sentinel-check command. Maybe something is crashing here on your hosts. This would cause kured to avoid reboots. |
in my case, finally the reboot started without changing anything. I do not know the reason. If I find something I will tell you |
A new flatcar release and the same problem. The /var/run/reboot-required file exists but no reboots. The grafana dashboard shows the nodes need to be rebooted |
@jorgelon Do you see the following warn-log in the kured-pod-logs?
This was added with 1.14.0. The problem with the host-commands is: We don't know what happens on the host, when the command crashes with an unexpected error or is blocked by some security-tools (e.g. aquasec, ...) this warn-log is the only indicator. Maybe you can analyze your host-logs for abnormalities around the check-executions. |
Nope @ckotzbauer |
Okay, that's sad. Then it will be very hard to figure out why the file is not detected. Kured logs the output of the "test -f" command and logs a warning when the exit-code is something unexpected. So it seems that the command either crashes silently (maybe something is logged in the syslog) or just exits with the exit-code which indicates that no reboot is required (also when the file exists) We will land some bigger security-improvements to 1.15.0, then we will mount the directory of the reboot-file as host-mount and do a "normal" existance check without "nsenter", this should work more smoothly. But 1.15.0 will be released after Kubernetes 1.29.0 (so in December). |
This issue was automatically considered stale due to lack of activity. Please update it and/or join our slack channels to promote it, before it automatically closes (in 7 days). |
updates with 1.15.0 . no changes inside a kured pod in a node with /var/run/reboot-required present /tmp # /usr/bin/nsenter -m/proc/1/ns/mnt -- test -f /var/run/reboot-required |
I have tried using Now I get / # /usr/bin/nsenter -m/proc/1/ns/mnt -- test -f /var/run/reboot-required The kured-1.15.0-dockerhub.yaml does not mount anything from the host. Still no reboots |
Thanks @jorgelon for coming back to this thread, You have two options now:
|
Right now I am using the helm chart to see if I get some different results. Default values.yaml But nothing happens. No reboot, no log, no annotations My doubt is how /bin/systemctl reboot is performed if that binary does not exists in the kured pods |
The binary is not called inside the pod, its called with |
Same problem here with kured-5.5.0 chart but it's working on another cluster where I have ghcr.io/kubereboot/kured:1.14.0 installed by manifest file. |
I'll point to my comment here, seems to be really related: Issue #952 Any solution? |
Deployed the latest release of kured(1.13.1) on to an Azure kubernetes cluster with kubernetes version (v1.26.3) almost one month back. I don't see any reboot-required created on the nodes and so I have created the dummy "reboot-required" file present in the "/var/run" path on all nodes of the cluster. Unfortunately the nodes are not rebooting and looking at the logs for the kured pods it says reboot not required.
Create /var/run/reboot-required Dummy file:
Kured pod logs:
The text was updated successfully, but these errors were encountered: