-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[KEP-2400] Node swap updates, GA criterias and clarifications #4701
base: master
Are you sure you want to change the base?
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: iholder101 The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
05552e0
to
4905593
Compare
16c4878
to
b3f9708
Compare
Please update https://github.com/kubernetes/enhancements/blob/master/keps/prod-readiness/sig-node/2400.yaml and update missing bits of the PRR questionaire. |
b3f9708
to
443db7f
Compare
Thanks @deads2k!
I see you're the assigned approver for alpha/beta.
Done! PTAL :) |
443db7f
to
f4af444
Compare
Signed-off-by: Itamar Holder <[email protected]>
…avior Signed-off-by: Itamar Holder <[email protected]>
Signed-off-by: Itamar Holder <[email protected]>
Signed-off-by: Itamar Holder <[email protected]>
Signed-off-by: Itamar Holder <[email protected]>
Signed-off-by: Itamar Holder <[email protected]>
Signed-off-by: Itamar Holder <[email protected]>
Signed-off-by: Itamar Holder <[email protected]>
Signed-off-by: Itamar Holder <[email protected]>
f4af444
to
4124649
Compare
/retitle [KEP-2400] Node swap ppdates, GA criterias and clarifications |
D'oh /retitle [KEP-2400] Node swap updates, GA criterias and clarifications |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR.
Here's a mix of feedback; I hope it is all useful.
@@ -1,4 +1,4 @@ | |||
# KEP-2400: Node system swap support | |||
# KEP-2400: Node system swap enablement |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# KEP-2400: Node system swap enablement | |
# KEP-2400: Node memory swap support |
The authoritative title is the title of the KEP issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Line 1 should read
title: Node memory swap support
The authoritative title is the title of the KEP issue.
When `--fail-swap-on=false` is provided to Kubelet but swap is not configured | ||
otherwise it is guaranteed that, by default, no Kubernetes workloads would | ||
be able to utilize swap. However, everything outside of kubelet's reach | ||
(e.g. system daemons, kubelet, etc) would be able to use swap. | ||
|
||
## Infrastructure Needed (Optional) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
## Infrastructure Needed (Optional) | |
## Infrastructure Needed |
(we do need this section)
@@ -431,6 +457,7 @@ To avoid exhausting swap on a node, `UnlimitedSwap` was dropped from the API in | |||
#### Security risk | |||
|
|||
Enabling swap on a system without encryption poses a security risk, as critical information, such as Kubernetes secrets, may be swapped out to the disk. If an unauthorized individual gains access to the disk, they could potentially obtain these secrets. To mitigate this risk, it is recommended to use encrypted swap. However, handling encrypted swap is not within the scope of kubelet; rather, it is a general OS configuration concern and should be addressed at that level. Nevertheless, it is essential to provide documentation that warns users of this potential issue, ensuring they are aware of the potential security implications and can take appropriate steps to safeguard their system. | |||
This is now documented and published, e.g. in this blog post: https://kubernetes.io/blog/2023/08/24/swap-linux-beta/. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is now documented and published, e.g. in this blog post: https://kubernetes.io/blog/2023/08/24/swap-linux-beta/. | |
The documentation updates are required; there is already a [blog article](https://kubernetes.io/blog/2023/08/24/swap-linux-beta/) that mentions the security implications. |
- KEPs describe the “desired state”; we mostly don't document progress inline.
- Security documentation needs to be done within the guidance itself, such as in https://kubernetes.io/docs/concepts/security/security-checklist/, and not just in blog articles.
* If the kernel version equals or is above 6.4, the tmpfs noswap option is being used when necessary. | ||
* Else, kubelet would try to mount a dummy volume with the tmpfs noswap option to understand whether the option is | ||
backported. If the mount succeeds, the tmpfs noswap option is being used when necessary. | ||
* Else, kubelet would raise a warning about the option not being supported and the possible risk. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Where does the warning appear?
@@ -561,6 +603,46 @@ type SwapStats struct { | |||
} | |||
``` | |||
|
|||
In addition, we've added swap to stats to Summary API and Prometheus endpoints (`/stats/summary` and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In addition, we've added swap to stats to Summary API and Prometheus endpoints (`/stats/summary` and | |
In addition, we'll add swap to stats to Summary API and Prometheus endpoints (`/stats/summary` and |
This is framed as a proposal to enhance, and not a changelog. So the proposed changes should be future tense.
@@ -561,6 +603,46 @@ type SwapStats struct { | |||
} | |||
``` | |||
|
|||
In addition, we've added swap to stats to Summary API and Prometheus endpoints (`/stats/summary` and | |||
`/metrics/resource`): | |||
```shell |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: not a valid shell script. Use ```console instead.
pod_swap_usage_bytes{namespace="kube-system",pod="coredns-8f5847b64-t9gmr"} 0 1687950863144 | ||
``` | ||
|
||
```shell |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: not a valid shell script. Use ```console instead.
- Nodes with swap enabled -> `node.swap.swapAvailableBytes` should be non-zero. | ||
- Nodes with memory pressure -> `node.swap.swapUsageBytes` should be non-zero. | ||
- Containers that reach their memory limit threshold -> `pods[i].containers[i].swap.swapUsageBytes` should be non-zero. | ||
- Pods with containers that reach their memory limit threshold -> `pods[i].swap.swapUsageBytes` should be non-zero. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(nit)
- Nodes with swap enabled -> `node.swap.swapAvailableBytes` should be non-zero. | |
- Nodes with memory pressure -> `node.swap.swapUsageBytes` should be non-zero. | |
- Containers that reach their memory limit threshold -> `pods[i].containers[i].swap.swapUsageBytes` should be non-zero. | |
- Pods with containers that reach their memory limit threshold -> `pods[i].swap.swapUsageBytes` should be non-zero. | |
- Nodes with swap enabled → `node.swap.swapAvailableBytes` should be non-zero. | |
- Nodes with memory pressure → `node.swap.swapUsageBytes` should be non-zero. | |
- Containers that reach their memory limit threshold → `pods[i].containers[i].swap.swapUsageBytes` should be non-zero. | |
- Pods with containers that reach their memory limit threshold → `pods[i].swap.swapUsageBytes` should be non-zero. |
@@ -33,7 +33,7 @@ latest-milestone: "v1.30" | |||
milestone: | |||
alpha: "v1.22" | |||
beta: "v1.30" | |||
stable: "TBD" | |||
stable: "v1.32" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
stable: "v1.32" | |
#stable: "v1.32" |
(AIUI)
We can put in a suggestion, but until we're planning that release, it's not a commitment. If we are planning that release, we should change latest-milestone
and stage
I think.
Add updates, GA criterias and clarifications
This PR updates the KEP in the following ways:
Emphasize that this KEP is about basic swap enablement
The original KEP indicated that pod-level swap APIs are out of scope:
enhancements/keps/sig-node/2400-node-swap/README.md
Lines 163 to 166 in 155a949
enhancements/keps/sig-node/2400-node-swap/README.md
Lines 142 to 144 in 155a949
However, the lack of APIs and the implicit nature of the current implementation sometimes brings suggestions to extend the API under this KEP.
This KEP focuses on a basic swap enablement. Follow-up KEPs regarding several topics (e.g. customization, zram/zswap suport, and more) will be introduced in the near future, in which we would be able to design and implement each extension in a focused way.
To ensure we're on the same page, this topic was recently raised in a sig-node meeting. In this meeting there was a very broad consensus that this approach makes sense, especially since the NodeSwap feature is important to many different parties which want it to "just work".
This PR updates the KEP to emphasize this approach.
GA criterias
The PR adds GA criterias, alongside the intent to GA in version 1.32.
Make sure PRR is ready
Updates
Since the last KEP updates, many improvements were made and many concerns were addressed. For example:
This PR updates the KEP to reflect these updates.