Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[KEP-2400] Node swap updates, GA criterias and clarifications #4701

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

iholder101
Copy link
Contributor

@iholder101 iholder101 commented Jun 6, 2024

  • One-line PR description:
    Add updates, GA criterias and clarifications
  • Other comments:

This PR updates the KEP in the following ways:

Emphasize that this KEP is about basic swap enablement
The original KEP indicated that pod-level swap APIs are out of scope:

- Allocating swap on a per-workload basis with accounting (e.g. pod-level
specification of swap). If desired, this should be designed and implemented
as part of a follow-up KEP. This KEP is a prerequisite for that work. Hence,
swap will be an overcommitted resource in the context of this KEP.

This KEP will be limited in scope to the first two scenarios. The third can be
addressed in a follow-up KEP. The enablement work that is in scope for this KEP
will be necessary to implement the third scenario.

However, the lack of APIs and the implicit nature of the current implementation sometimes brings suggestions to extend the API under this KEP.

This KEP focuses on a basic swap enablement. Follow-up KEPs regarding several topics (e.g. customization, zram/zswap suport, and more) will be introduced in the near future, in which we would be able to design and implement each extension in a focused way.

To ensure we're on the same page, this topic was recently raised in a sig-node meeting. In this meeting there was a very broad consensus that this approach makes sense, especially since the NodeSwap feature is important to many different parties which want it to "just work".

This PR updates the KEP to emphasize this approach.

GA criterias
The PR adds GA criterias, alongside the intent to GA in version 1.32.

Make sure PRR is ready

Updates
Since the last KEP updates, many improvements were made and many concerns were addressed. For example:

  • Memory-backed volumes
  • Added metrics
  • Kubelet Configuration examples
  • more

This PR updates the KEP to reflect these updates.

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/node Categorizes an issue or PR as relevant to SIG Node. labels Jun 6, 2024
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: iholder101
Once this PR has been reviewed and has the lgtm label, please assign dchen1107 for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jun 6, 2024
@iholder101
Copy link
Contributor Author

@iholder101 iholder101 force-pushed the kep2400/post_beta2 branch 2 times, most recently from 16c4878 to b3f9708 Compare June 9, 2024 09:17
@deads2k
Copy link
Contributor

deads2k commented Jun 10, 2024

Please update https://github.com/kubernetes/enhancements/blob/master/keps/prod-readiness/sig-node/2400.yaml and update missing bits of the PRR questionaire.

@iholder101
Copy link
Contributor Author

Thanks @deads2k!

Please update master/keps/prod-readiness/sig-node/2400.yaml

I see you're the assigned approver for alpha/beta.
Is it OK to also assign you as the approver for GA?

and update missing bits of the PRR questionaire.

Done! PTAL :)

Signed-off-by: Itamar Holder <[email protected]>
Signed-off-by: Itamar Holder <[email protected]>
@sftim
Copy link
Contributor

sftim commented Jun 24, 2024

/retitle [KEP-2400] Node swap ppdates, GA criterias and clarifications

@k8s-ci-robot k8s-ci-robot changed the title [KEP-2400] Updates, GA criterias and clarifications [KEP-2400] Node swap ppdates, GA criterias and clarifications Jun 24, 2024
@sftim
Copy link
Contributor

sftim commented Jun 24, 2024

D'oh

/retitle [KEP-2400] Node swap updates, GA criterias and clarifications

@k8s-ci-robot k8s-ci-robot changed the title [KEP-2400] Node swap ppdates, GA criterias and clarifications [KEP-2400] Node swap updates, GA criterias and clarifications Jun 24, 2024
Copy link
Contributor

@sftim sftim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR.

Here's a mix of feedback; I hope it is all useful.

@@ -1,4 +1,4 @@
# KEP-2400: Node system swap support
# KEP-2400: Node system swap enablement
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# KEP-2400: Node system swap enablement
# KEP-2400: Node memory swap support

The authoritative title is the title of the KEP issue.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line 1 should read

title: Node memory swap support

The authoritative title is the title of the KEP issue.

When `--fail-swap-on=false` is provided to Kubelet but swap is not configured
otherwise it is guaranteed that, by default, no Kubernetes workloads would
be able to utilize swap. However, everything outside of kubelet's reach
(e.g. system daemons, kubelet, etc) would be able to use swap.

## Infrastructure Needed (Optional)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
## Infrastructure Needed (Optional)
## Infrastructure Needed

(we do need this section)

@@ -431,6 +457,7 @@ To avoid exhausting swap on a node, `UnlimitedSwap` was dropped from the API in
#### Security risk

Enabling swap on a system without encryption poses a security risk, as critical information, such as Kubernetes secrets, may be swapped out to the disk. If an unauthorized individual gains access to the disk, they could potentially obtain these secrets. To mitigate this risk, it is recommended to use encrypted swap. However, handling encrypted swap is not within the scope of kubelet; rather, it is a general OS configuration concern and should be addressed at that level. Nevertheless, it is essential to provide documentation that warns users of this potential issue, ensuring they are aware of the potential security implications and can take appropriate steps to safeguard their system.
This is now documented and published, e.g. in this blog post: https://kubernetes.io/blog/2023/08/24/swap-linux-beta/.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This is now documented and published, e.g. in this blog post: https://kubernetes.io/blog/2023/08/24/swap-linux-beta/.
The documentation updates are required; there is already a [blog article](https://kubernetes.io/blog/2023/08/24/swap-linux-beta/) that mentions the security implications.
  1. KEPs describe the “desired state”; we mostly don't document progress inline.
  2. Security documentation needs to be done within the guidance itself, such as in https://kubernetes.io/docs/concepts/security/security-checklist/, and not just in blog articles.

* If the kernel version equals or is above 6.4, the tmpfs noswap option is being used when necessary.
* Else, kubelet would try to mount a dummy volume with the tmpfs noswap option to understand whether the option is
backported. If the mount succeeds, the tmpfs noswap option is being used when necessary.
* Else, kubelet would raise a warning about the option not being supported and the possible risk.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Where does the warning appear?

@@ -561,6 +603,46 @@ type SwapStats struct {
}
```

In addition, we've added swap to stats to Summary API and Prometheus endpoints (`/stats/summary` and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
In addition, we've added swap to stats to Summary API and Prometheus endpoints (`/stats/summary` and
In addition, we'll add swap to stats to Summary API and Prometheus endpoints (`/stats/summary` and

This is framed as a proposal to enhance, and not a changelog. So the proposed changes should be future tense.

@@ -561,6 +603,46 @@ type SwapStats struct {
}
```

In addition, we've added swap to stats to Summary API and Prometheus endpoints (`/stats/summary` and
`/metrics/resource`):
```shell
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: not a valid shell script. Use ```console instead.

pod_swap_usage_bytes{namespace="kube-system",pod="coredns-8f5847b64-t9gmr"} 0 1687950863144
```

```shell
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: not a valid shell script. Use ```console instead.

Comment on lines +1114 to +1117
- Nodes with swap enabled -> `node.swap.swapAvailableBytes` should be non-zero.
- Nodes with memory pressure -> `node.swap.swapUsageBytes` should be non-zero.
- Containers that reach their memory limit threshold -> `pods[i].containers[i].swap.swapUsageBytes` should be non-zero.
- Pods with containers that reach their memory limit threshold -> `pods[i].swap.swapUsageBytes` should be non-zero.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(nit)

Suggested change
- Nodes with swap enabled -> `node.swap.swapAvailableBytes` should be non-zero.
- Nodes with memory pressure -> `node.swap.swapUsageBytes` should be non-zero.
- Containers that reach their memory limit threshold -> `pods[i].containers[i].swap.swapUsageBytes` should be non-zero.
- Pods with containers that reach their memory limit threshold -> `pods[i].swap.swapUsageBytes` should be non-zero.
- Nodes with swap enabled `node.swap.swapAvailableBytes` should be non-zero.
- Nodes with memory pressure `node.swap.swapUsageBytes` should be non-zero.
- Containers that reach their memory limit threshold `pods[i].containers[i].swap.swapUsageBytes` should be non-zero.
- Pods with containers that reach their memory limit threshold `pods[i].swap.swapUsageBytes` should be non-zero.

@@ -33,7 +33,7 @@ latest-milestone: "v1.30"
milestone:
alpha: "v1.22"
beta: "v1.30"
stable: "TBD"
stable: "v1.32"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
stable: "v1.32"
#stable: "v1.32"

(AIUI)
We can put in a suggestion, but until we're planning that release, it's not a commitment. If we are planning that release, we should change latest-milestone and stage I think.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/node Categorizes an issue or PR as relevant to SIG Node. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
Status: Needs Reviewer
Development

Successfully merging this pull request may close these issues.

None yet

7 participants