-
Notifications
You must be signed in to change notification settings - Fork 186
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add validator nvidia driver whether install success #157
base: master
Are you sure you want to change the base?
add validator nvidia driver whether install success #157
Conversation
77f7b36
to
f001188
Compare
In this version, we need to manually label GPU-node(label:"gpu=on") during installation, so we can safely assume that each node with label "gpu=on" is "gpu-ready".
|
Thanks for your suggestion, I have a different opinion on one point "and use .Values.devicePlugin.validator.enabled to switch between "manual-mode" and "auto-mode"(default manual mode)", we can add |
agreed:), that looks better |
e4b52aa
to
84f807b
Compare
please modify "Label your nodes" section in README.md and README_CN.md, introduce your manual and auto label strategy here |
84f807b
to
effd591
Compare
It's just a question, I think this PR only introduces a small feature, but it requires a new |
Can you add some test screenshots? |
Added part of the testing info. |
charts/hami/templates/device-plugin/daemonsetvalidator-configmap.yaml
Outdated
Show resolved
Hide resolved
Additionally, can we start introducing keywords like |
effd591
to
637f675
Compare
Signed-off-by: lengrongfu <[email protected]>
637f675
to
ca232fa
Compare
They need to be modified together, and there are still many. |
I still think this PR is overly complicated. Please give me some time, I will research whether there are other implementation solutions. |
@chaunceyjiang Hi, any progress on this? Or we can merge PR first and then optimize later. |
/hold In the next few days, I will continue to explore this issue. |
Fixes: #136
Add a
driver-validator
Daemonset workload, in the node to valid Nvidia driver when the install is a success. if install success, add node labelhami.io/driver-validator="true"
elsehami.io/driver-validator="false"
,device-plugin
Daemonset add a nodeSelecthami.io/driver-validator="true"
.Test Result
device-plugin
pod not exist.device-plugin
start success