Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running custom script before node reboot #992

Open
DimitarStefanovMihov opened this issue Oct 15, 2024 · 3 comments
Open

Running custom script before node reboot #992

DimitarStefanovMihov opened this issue Oct 15, 2024 · 3 comments

Comments

@DimitarStefanovMihov
Copy link

DimitarStefanovMihov commented Oct 15, 2024

Hello, kured team
I am running kured:1.16.0 on my Kubernetes cluster which hosts a number of different Databases(postgres, redis, mongo) on different nodes. Before I let kured do it's magic I need to execute a script that stepsDown the masters of those Databases:
--reboot-sentinel-command=nsenter --target 1 --mount --uts --ipc --net /usr/bin/python3 /var/run/pre-reboot-script.py - the script will return 0 if all is okey and the lock and reboot process may continue.

The problem is that I need to automate the whole process. In order to do that I copy the script and put it in /var/run/. Unfortunatelly, kured on each node starts looking for the script and trying to executed it, which turns into e big mess, because nodes start trading Database masters. I tried combining the script with the actual reboot command in:
--reboot-command - this way the script will surely be executed 1 at a time. But the reboot command wasn't working that way.
Can --reboot-command combine couple of commands together(even in a python script) ?

I decided to put manual lock in the begining of my script - I put a label on the node that is being seen from all other nodes and the scripts that run there. This way whichever kured on a kubernetes node starts the script and puts a label on that node, the other scripts will see that and won't continue executing. The problem here came with the postRebootNodeLabels flag - I couldn't make it remove the label I put in the begining so that it releases the node and allowed other scripts to run against their nodes.

Is it possible that postRebootNodeLabels flag can remove labels ?

Next, I tried putting Annotation on the daemonset that controls the kured pods - weave.works/kured-node-lock: - which is the official lock the kured and I know it disappears after rebooting the node. This way it will act as a manual lock in my script begining and when the official lock from kured takes over, it will put whatever info it needs there and will delete the Annotation after rebooting, allowing other nodes to put their lock and continue the scripts. But each time kured returned error that it expects another letter instead of some letter in the word I used for the lock.

Is it possible to manually add annotation to the daemonset so that it works with kured locked ?

Overall, is there another way that to run scripts before lock-drain-reboot or after reboot itself(or just run simple commands after reboot) ?

Thank you for your time and Thank you for this great tool !

@DimitarStefanovMihov
Copy link
Author

EDIT: I was able to achieve manual lock with annotateNodes: flag, but if you could answet my previous questions, I would very much appreciate that.

@evrardjp evrardjp self-assigned this Oct 18, 2024
@evrardjp
Copy link
Collaborator

evrardjp commented Oct 19, 2024

I had tough time trying to understand your goals. I am not sure I can answer your questions without a bit of clarifications first.

First, the sentinel: Whether it's a command or a file present, it should only be used to determine if reboot is required.
Don't over complicate, it's gonna be a pain later.

Second, the blockers: It looks like what you're trying to do is prevent a node from rebooting when it's master. Kured can already prevent a reboot if a pod is present. I suppose there is a way you can use a filter to prevent the active master to reboot. I think it's a bad idea - overall it won't help you to have the master blocked for reboots - but it's possible.

Assuming you go to the clean route of not blocking in case a database exists, you have pdbs. What's the problem with cordoning and draining the master? Isn't your database recovering from the drain? Don't you have an operator handling the database state? This should be the way. Kured should not "compensate" for something that's outside its work.

Yet, if you still want to do it, there is the reboot command. Keep in mind it happens after the drain/cordon. Here it seem that you want to ensure, if the pod is master on that node, that such pod gracefully stops before doing the rest of the work. This is totally doable. Push up a script for kured's use, point your kured ds to the script which will executed with nsenter -m/proc/1/ns/mnt -- yourexecutable. I don't see where the mess is (outside the fact of using a script, especially because it's something better suited for a controller). If you have anti-affinity on nodes and only certain nodes are hosting the databases, then make sure the script is only on those nodes, and configure the daemonset accordingly. But it's most likely error prone, should something be scheduled on another node one day...

As you can see, I am really confused about the problem you're hitting...

@evrardjp evrardjp removed their assignment Oct 19, 2024
@DimitarStefanovMihov
Copy link
Author

Apologies for the unclear writing. I will try my best to clarify.

First, the sentinel: Whether it's a command or a file present, it should only be used to determine if reboot is required. - unfortunately I need it to do more than just determine if reboot is required. So I need to push kured to the limits of it's abilities.

Second, the blockers: - No, here I am NOT trying to prevent a node from rebooting, but rather locking that master manually before the drain/cordon so that I can run my script and make the current node not have any masters on it, then I can safely return status 0 and the drain/cordon -> reboot can continue.

Assuming you go to the clean route of not blocking in case a database exists, you have pdbs. What's the problem with cordoning and draining the master? - I don't know the exact reason, but I am not allowed to drain/evict pods if they are masters.(it is a human restraint, not from Kubernetes)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants