Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve security of reboot mechanism #416

Closed
rptaylor opened this issue Jul 28, 2021 · 29 comments · Fixed by #814
Closed

improve security of reboot mechanism #416

rptaylor opened this issue Jul 28, 2021 · 29 comments · Fixed by #814
Labels
keep This won't be closed by the stale bot. security
Milestone

Comments

@rptaylor
Copy link

rptaylor commented Jul 28, 2021

Hello,

We would really like to use kured but there is reluctance about a fully privileged daemonset that has access to execute arbitrary commands as root on every node. I have reviewed some options that could allow rebooting more securely.

  • Option 0 (current kured behaviour): nsenter and execute a configurable shutdown command (/bin/systemctl reboot by default) in the host namespace to reboot. Requires full privileges.
  • Option 1 grant CAP_SYS_BOOT to kured to allow the reboot(2) syscall
    • Option 1A semi-dirty reboot: call sync() and then reboot(). Should avoid any filesystem corruption, but systemd will not be aware of the reboot so it is not very friendly to system services. Was discussed in Support unprivileged container #172
    • Option 1B graceful reboot via Ctrl Alt Del: invoke reboot syscall with LINUX_REBOOT_CMD_CAD_OFF, then issue a CtrlAltDel keystroke to trigger a graceful systemd-managed shutdown via the ctrl-alt-del.target unit file.
  • Option 2: graceful reboot via a kill signal. Grant CAP_KILL to kured so it can send SIGRTMIN+5 to PID 1 (systemd), which is an equivalent way to achieve a graceful systemd-managed reboot (without relying on the ctrl-alt-del.target unit, thanks to evrardjp's comment - though this may be less portable than SIGINT) .

Comparisons

  • Option 0
    • pros: currently working this way
    • cons: not very secure, pod runs with far more privileges than should be needed to reboot
  • Option 1A:
    • pros: easy to do, most secure (only CAP_SYS_BOOT required)
    • cons: not graceful (systemd-managed)
  • Option 1B:
    • would be the best of both worlds but after further investigation I am not sure if/how it is possible. Here is example code to issue a CtrlAltDel keystroke but using uinput which requires root privilege so that does not help. It might be possible with ioctl but that could require additional capabilities (using TIOCSTI, CAP_SYS_ADMIN?). (Also users may have masked ctrl-alt-del.target or changed the reboot keystroke from the default CtrlAltDel)
  • Option 2:
    • pros: easy to do, graceful, requires far fewer privileges (only CAP_KILL), more secure than Option 0
    • cons: sending any signal to any host process is still a significant attack surface.

All the options still require hostPID: true.

Anyway do you think option 2 is at least a clear improvement over 0 and might be suitable as a new default behaviour? And maybe Option 1 A (or B if possible) could be configurable as more secure alternatives?

@evrardjp
Copy link
Collaborator

evrardjp commented Jul 28, 2021

Option 2 is, as you mentioned, about equal to option1B.
There is an option 2B, if you follow systemd code. I did a PoC in one of my testbeds, for which I did a kill signal SIGRTMIN+15. It was not portable, as some other people tried without success. I did it with CAP_KILL and hostPID.

To our tests, I would say option2 should not be the default option.

However, I believe it would be nice to allow people to go for option2 or option3.
For that, I discussed these approaches in the past, and updated the code to go towards it.

I feel it's only a slight change to have an option to not wrap with nsenter.
If we bring that option, it would be easier for a user to override the command to kill -SIGRTMIN+15 1 with the right privileges.

You might be interested by #359 for which the security section is relevant for you.

@rptaylor
Copy link
Author

rptaylor commented Jul 28, 2021

@evrardjp thanks for the comments!
Okay, SIGRTMIN+15 should "immediately reboot the machine" , that has the advantage of not relying on ctrl-alt-del.target but it sounds like it might be less graceful (?), in the same way the reboot(2) call immediately reboots. SIGRTMIN+5 sounds like the best choice IMHO (starts the reboot.target unit).

If anything, sending a signal to PID 1 should be the most portable option, in the sense that it has at least some non-zero possibility of working on SysV or other non-systemd systems , unlike /bin/systemctl reboot. A priori most systems should behave according to the systemd documentation but it could require further testing and investigation on some platforms.

I feel it's only a slight change to have an option to not wrap with nsenter.

That sounds nice and seems relatively easy. But I wonder if there might be any gotchas with executing /bin/kill (packaged in the kured container image), as opposed to doing process.Signal(sig) in the code?
The latter should be cleaner. Not sure if there could be situations where the kill executable might have portability issues with different host OSes or kernels.

Another option would be to have a configurable property --reboot-signal which , if configured, is used instead of --reboot-command, specifying which signal to send to PID 1. What do you think?

Either way, the final part would be to actually realize the security benefit by adapting the PSP to grant limited capabilities based on which approach is selected. With Helm that could be done e.g.

{{- if .Values.configuration.rebootSignal}}
  privileged: false
  allowedCapabilities: ['KILL']
{{- else }}
  privileged: true
  allowedCapabilities: ['*']
{{ end }}

or similarly {{- if not .Values.configuration.use_nsenter }}

However, if the use_nsenter option would apply to both the reboot command and the sentinel command, the Helm logic could get slightly more complicated. Personally I like the idea of not using an executable command at all to reboot, just send the signal directly, just my 2c.

@evrardjp
Copy link
Collaborator

evrardjp commented Jul 28, 2021

My notes are very close to what you're proposing. I used both signals.

Even when you think portability isn't a problem with systemd, it actually is. Who really reads what's packaged in the main OSes? Our tests have shown that it sometimes doesn't work. Depends on OS behaviour, apparmor, selinux, etc. Kind + ubuntu without apparmor was not happy last time I tried.

I am not against adding this new feature, as long as its properly tested.

PS: You should probably check #359 , maybe that could interest you, security wise. Because reducing the scope of the ds isn't a complete solution.

PS2: I would say that it's indeed easier to have this in our code. However, the refactoring will be bigger. Not impossible though ;)

@rptaylor
Copy link
Author

rptaylor commented Jul 28, 2021

@evrardjp Thanks, I did look at #359 but I think some significant security improvements can be achieved without a major architectural redesign (though that could also have valuable security improvements in its own right).

Even when you think portability isn't a problem with systemd, it actually is.
I am not against adding this new feature, as long as its properly tested.

Certainly. I already tested that SIGINT to PID 1 works as expected and documented on EL7,8. Likewise for SIGRTMIN+5:

Jul 28 22:33:46 el8-test.novalocal systemd[1]: Received SIGRTMIN+5 from PID 1764 (n/a).
...
Jul 28 22:33:47 el8-test.novalocal systemd[1]: Reached target Shutdown.

If we add a new option that is non-default behaviour, from my point of view as long as kured does what I tell it to do (send the specified signal to PID 1) , it is working correctly, and it is my responsibility (the user) to make sure I configured kured (and my cluster) correctly to have the desired outcome when the nodes receive that signal. Does that seem reasonable?

PS2: I would say that it's indeed easier to have this in our code. However, the refactoring will be bigger. Not impossible though ;)

Okay, do you suggest to proceed with option 2, by signaling in the code and a --reboot-signal option, rather than invoking /bin/kill ?
I can probably at least propose a PR and do testing but might need help with some details on code changes.

Thanks!

@evrardjp
Copy link
Collaborator

evrardjp commented Jul 29, 2021

I don't see how it can't be done in two PRs, to iterate on this in a "simpler" way.

In any case, we can help you with the PRs! :)

First PR could tackle adding kill to the kured container, for which you can add tests: It's just a different rebootCommand. That should be a very simple PR.

The second PR we could probably remove that kill package from the image, and refactoring the code. That will take a longer time, to define the right refactor.

@rptaylor
Copy link
Author

@evrardjp Okay thanks! Sorry I am not sure, do you mean for the 1st PR, it would involve basically copying https://github.com/weaveworks/kured/blob/main/tests/kind/follow-coordinated-reboot.sh with a different rebootCommand configured? I'm not sure where the kured config would go in that script.

@github-actions
Copy link

This issue was automatically considered stale due to lack of activity. Please update it and/or join our slack channels to promote it, before it automatically closes (in 7 days).

@rptaylor
Copy link
Author

Still relevant but not sure how to proceed.

@dholbach dholbach added the keep This won't be closed by the stale bot. label Sep 28, 2021
@pjbgf
Copy link

pjbgf commented Nov 24, 2021

Thank you @rptaylor for the great work detailing the options.

Options 1 and 2 allows end-users to run kured without being privileged, which then also opens up for further lock-down capabilities (apparmor, seccomp, etc). Once this is in place it would allow myself and the folks from security-profiles-operator to create some security profiles for kured.

In terms of next steps, are we happy to proceed with a PR for option 2 that is gated by some sort of feature switch? This would allow for backwards compatibility, so no negative impact on users using the default nsenter approach.
I am happy to support @rptaylor with the PR and tests if that's the case.

@rptaylor
Copy link
Author

rptaylor commented Nov 24, 2021

Thanks @pjbgf . As far as I can tell I think a rebootSignal option (non default) would be needed as well as the existing rebootCommand, so that the container can work with only CAP_KILL instead of privileged. I am happy to at least take a stab at that if the maintainers agree.

Also if seccomp can limit which kill signals may be sent that could be a possible future improvement, out of scope of current issue.

@admincasper
Copy link

This has been an open issue for almost two years now, but there is still no implementation on the issue which I find concerning. We want to run Kured in production clusters but are unable to because the baseline security policy to restrict capabilities cannot be applied to Kured.

@evrardjp
Copy link
Collaborator

I am pretty sure the team is willing to accept any contributions that improve the security of kured... Keep in mind that it's a tough topic... like many engineering topics, it's all about tradeoffs.

I think it's important to know what you want to do with kured, as kured is very flexible. Some of the concerns here can be done without a code change. For the rest, please don't hesitate to contribute too :)

@admincasper
Copy link

We're using Kured to restart updated nodes every week during downtime, we get alerts in Teams and are very happy with how it's working. It's only a matter of security or specifically the securitycontext for the daemonset. Since Kured is specifically mentioned in Microsoft Documentation and the use-case for Kured is very useful I was hopeful it was ready for production environments.

I would like to contribute but I don't have the Linux expertise and have no downtime. But it's an issue I'm highly anticipating and supporting!

@rptaylor
Copy link
Author

@evrardjp okay, can we proceed with adding a rebootSignal option then, to send a configurable signal to PID 1 ?

@ckotzbauer
Copy link
Member

@rptaylor Yes, I think this would be the best option.

@rptaylor
Copy link
Author

rptaylor commented Apr 4, 2022

Going over the code again it seems to me like the best way to proceed IMHO would be:

1st PR

  • add a new configurable option: rebootMethod, default value "command"
  • check and complain if it has an invalid value, for now only "command" would be possible
  • refactor the rebooting code to first determine how to reboot based on rebootMethod, if "command" then invoke the same code path

No change in behaviour.

2nd PR

  • add support for another non-default value of rebootMethod, "signal"
  • add a new configurable option: rebootSignal, default value "SIGRTMIN+5"
  • add function to send the configured signal to PID 1
  • when performing the reboot , add: else if rebootMethod is "signal" , then invoke the new function to send the signal

However while reviewing the code I also noticed that although the documentation indicates the default behaviour is to check for existence of a sentinel file, the code nevertheless achieves this by executing a test -f sentinel command:
https://github.com/weaveworks/kured/blob/main/cmd/kured/main.go#L661
So a parallel effort would be needed on the sentinel side in order to realize the final goal of not executing commands on the host with nsenter. It could be achieved via read-only hostPath usage instead. #526

@rptaylor
Copy link
Author

rptaylor commented Apr 5, 2022

For the record I'm a cluster operator not a developer, and I don't know go per se. If someone wants to take a stab at a PR please feel free. Otherwise I could put my copy + paste skills to the test at some point but the refactoring involved to do it properly seems a bit more than I expected at first.

@cloud-az
Copy link

Hi. What is the status of using Kured without setting privileged: true?

@ckotzbauer
Copy link
Member

There's no new status right now @cloud-az. But we need to bring things forward for the security-topic, would be glad if you could help us.

@ckotzbauer
Copy link
Member

ckotzbauer commented Aug 4, 2023

Just re-read all security-related threads. As discussed in this issue, I think we should start with the implementation of a second reboot-type in addition to the current "command" implementation as described by @rptaylor here.

Parallel to that the check logic needs to be adapted with the ro-hostPath approach from #526.
Both changes are not that hard to implement I think, but they need very good testing, were we need all of you folks @cloud-az @rptaylor @pjbgf @evrardjp @jackfrancis

TODOs:

I can implement some parts on my own, but I would be happy if someone could help 😉

@ckotzbauer
Copy link
Member

Current state: I've also implemented the first two todos locally and tested it in my home-cluster. The signal in general reboots the server (Ubuntu 20.04 LTS) gracefully, that looks good. However, with this securityContext the command errors with "permission denied":

    securityContext:
      allowPrivilegeEscalation: false
      capabilities:
        add:
        - CAP_KILL
        drop:
        - '*'
      privileged: false

Does somebody know which other capabilities might be needed?

@rptaylor
Copy link
Author

rptaylor commented Aug 4, 2023

Does somebody know which other capabilities might be needed?

Is the process still running as UID 0?
As far as capabilities go, it seems like CAP_KILL should be sufficient. To confirm whether your process actually has that capability in the container try getpcaps, or grep ^Cap /proc/<PID>/status and then use capsh --decode= on the displayed strings. Other things that come to mind which could possibly get in the way would be SELinux and seccomp. In the old way, that would be controlled by Pod Security Policy. I am not familiar with the new Pod Security Standards yet so not sure if/how SElinux and seccomp would be applicable there.

The pod will also need hostPID but that was also an old PSP concept. In any case if you set the Pod Security Admission mode to 'warn' it should bypass any of those issues for debugging purposes.

@ckotzbauer
Copy link
Member

ckotzbauer commented Aug 4, 2023

Thanks for your reply, I will have a more detailed look tomorrow.

@ckotzbauer
Copy link
Member

ckotzbauer commented Aug 5, 2023

@rptaylor

Is the process still running as UID 0?

yes

The granted capabilities for the container-process are the following, cap_kill is available.

0x00000000a80425fb=cap_chown,cap_dac_override,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_net_bind_service,cap_net_raw,cap_sys_chroot,cap_mknod,cap_audit_write,cap_setfcap

Neither PSPs nor Pod Security Admission are turned on in the cluster.
By default the RuntimeDefault Seccomp-Profile is used. However, also with the permissive Unconfined profile, the signal is denied. SELinux is disabled on the host.

Edit: Also with securityContext.capabilities.app='*' there are no additional capabilities permitted.

@ckotzbauer
Copy link
Member

ckotzbauer commented Aug 5, 2023

PR is opened with further testing-instructions: #814

@ckotzbauer ckotzbauer added this to the 1.15.0 milestone Aug 12, 2023
@sftim
Copy link

sftim commented Nov 24, 2023

An idea:

  • set up a systemd unit that triggers a reboot
  • have a Pod that is able to activate the previous unit by writing to a path

  • also have a unit that does the actual reboot

[Unit]
Description=Trigger patch and reboot
BindsTo=prepare-patch-and-reboot-trigger.service
After=prepare-patch-and-reboot-trigger.service

[Path]
# To trigger it, create a file or directory named /run/trigger-patch-and-reboot
PathExists=/run/trigger-patch-and-reboot

[Install]
# Enable the filesystem watching by default
WantedBy=multi-user.target

along with

# thinking of something like https://bootlin.com/pub/conferences/2022/elce/opdenacker-implementing-A-B-system-updates-with-u-boot/opdenacker-implementing-A-B-system-updates-with-u-boot.pdf
[Unit]
Description=Trigger reboot for update

[Service]
Type=oneshot
ExecStart=/bin/systemctl isolate prepare-patch-and-reboot.target

@rptaylor
Copy link
Author

An idea:

@sftim I like that idea, it would only require write privilege to a hostpath , so the container could be fully unprivileged, no need to execute commands or send signals. I didn't know systemd could trigger based on a path, neat. Another reboot method "path" in addition to the "command" and "signal" ones would be needed.

That being said, setting up systemd unit files on the node means more of the solution is living outside kured and would need to be configured out of band. However, a systemd timer (like a cron job) would be useful to apply the OS updates in the first place on nodes and set a sentinel flag. So, that would make a valid rationale for cluster admins to be adding systemd files on their nodes already.

@sftim
Copy link

sftim commented Dec 15, 2023

The systemd method could be one option of several, and would suit cluster admins who make (or consume) custom OS images. A custom node image can include a reboot trigger path for Pods to use.

@rptaylor
Copy link
Author

This will be closed hopefully by #814 but I made #868 to retain the path idea.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
keep This won't be closed by the stale bot. security
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants