-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
reasonable --image-gc-high-threshold #4739
base: master
Are you sure you want to change the base?
Conversation
Hi @olderTaoist. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
@haircommander @SergeyKanzhelev discuss here |
/ok-to-test |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: olderTaoist The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@haircommander @SergeyKanzhelev please review again |
|
||
### Goals | ||
|
||
discuss reasonable values of `--image-gc-high-threshold` for different scenarios, and constrain them by some means. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think generally a goal should be: protect users from inopportune configurations, and fix the defaults so they make more sense
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You said it more summarily. To protect users from inopportune configurations, we should first figure out the needs of users in different scenarios, and then take corresponding protection measures. What you say can be part of the goal.
|
||
#### Story 2 | ||
|
||
Keep the current usage that turn off image garbage collection by setting `--image-gc-high-threshold` 100%. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn't this possible already today?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, i'm just illustrating different scenarios
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think typically the user stories are more like "what problem does adding this feature solve". in this case, it's really "As a new cluster admin, I'd like for image garbage collection to trigger before a node is under disk pressure, and would like validation to protect that expectation". The two scenarios you've described are available today, it's just not the default, and when an admin misconfigures, they get no insight
### Monitoring Requirements | ||
|
||
Monitor the metrics | ||
- "kubelet_image_gc_before_storage_eviction" that contains `image-gc-threshold` and `imagefs-available` labels |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
specifically: I think this metric should say whether the image-gc-threshold is higher than 100-imagefs-available
, which would be a breaking configuration in the future
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The value of the metric is exactly what you mean, with 1 for image-gc-threshold greater than 100-imagefs-available
and 0 for image-gc-threshold less than or equal to 100-imagefs-available
, the metric labels let users see intuitively the value of the corresponding configuration
- Impact of its outage on the feature: | ||
- Impact of its degraded performance or high-error rates on the feature: | ||
--> | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you'll want to put N/A for any section that's not relevant, but some of these may be.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
|
||
approvers: | ||
- "@haircommander" | ||
- "@SergeyKanzhelev" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we'll need a sig node approver for this, but we won't determine that until 1.32 cycle begins, so we needn't update it now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i change to TBD
022a7da
to
ee590a2
Compare
## Design Details | ||
|
||
Add `ImageGCBeforeStorageEviction` feature gate to kubelet. | ||
When the feature is turned on, the value of `--image-gc-high-threshold` must be smaller than value of `100 - imagefs.available`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the value of --image-gc-high-threshold
is not less than "100 - imagefs.available", what will kubelet do? Will it panic, or will this field become ineffective? Can you explain this in detail here?
##### Unit tests | ||
|
||
|
||
- `pkg/kubelet/apis/config/validation`: `-:-:-` - `0%` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is the test coverage 0%? The test coverage of the latest master branch is 97.1%.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, just figured out what Unit tests
means. already updated as https://testgrid.k8s.io/sig-testing-canaries#ci-kubernetes-coverage-unit stated.
## Design Details | ||
|
||
Add `ImageGCBeforeStorageEviction` feature gate to kubelet. | ||
When the feature is turned on, the value of `--image-gc-high-threshold` must be smaller than value of `100 - imagefs.available`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will we change the default value of --image-gc-high-threshold
? What will it be changed to? Please also explain this here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Can someone take a look? @bart0sh @SergeyKanzhelev @haircommander @sftim @HirazawaUi |
KEP-4120: add
ImageGCBeforeStorageEviction
feature gate to kubeletOne-line PR description: discuss reasonable values of
--image-gc-high-threshold
for different scenarios, and constrain them by some means.Issue link: relationship between --image-gc-high-threshold and imagefs.available #4727
Other comments: