Add an option to create a node template from capacity annotations even if there are nodes inside a node group #7380

ttsuuubasa · 2024-10-11T07:22:16Z

Which component are you using?:

cluster-autoscaler

Is your feature request designed to solve a problem? If so describe the problem this feature should solve.:

The problem is that if some resources of nodes are broken down and kubernetes can't see that,
Cluster Autoscaler can't make decision correctly to scale-up when the node is selected as NodeInfo.
This is against the assumption that all machines inside a node group have identical capacity.

For example,
if the nodes have 3 GPU but 1 GPU of a certain node is crashed and the node is selected as NodeInfo,
Cluster Autoscaler recognizes the nodes inside the node group have 2 GPU and doesn't scale-up to schedule pods requesting 3 GPU.

Describe the solution you'd like.:

Our solution makes Cluster Autoscaler look into capacity annotations in MachineSet
only when the particular annotation is set to MachineSet like "spec-fixed: enabled".
The capacity annotations instruct Cluster Autoscaler about size of nodes in a node group.
If MachineSet has no annotation, Cluster Autoscaler selects a node for NodeInfo as usual.

Describe any alternative solutions you've considered.: NA

Additional context.:

Current Implementation

When Cluster Autoscaler scale-up, it selects a node randomly from every node group and makes judgement whether it is possible to scale-up or not based on the node template.
If there is no node inside a node group, Cluster Autoscaler decides a scalability with capacity annotations, which specifies a machine specification in MachineSet in cluster-api.

What we would like to realize
Even if there are nodes inside a node group, Cluster Autoscaler creates a node template from the capacity annotations
when specifying a particular annotation into MachineSet.

We would like to discuss and study the feasibility of this function.

adrianmoisey · 2024-10-11T18:03:35Z

/area cluster-autoscaler

x13n · 2024-10-18T09:01:47Z

If the node is only partially healthy, is there a way to tell that based on the k8s Node object? Some condition perhaps? If the answer is yes, we could instead just update this check to prevent Cluster Autoscaler from picking such nodes to be used as templates:

autoscaler/cluster-autoscaler/processors/nodeinfosprovider/mixed_nodeinfos_processor.go

Lines 210 to 215 in ff2f0ce

    
           func isNodeGoodTemplateCandidate(node *apiv1.Node, now time.Time) bool { 
        
           	ready, lastTransitionTime, _ := kube_util.GetReadinessState(node) 
        
           	stable := lastTransitionTime.Add(stabilizationDelay).Before(now) 
        
           	schedulable := !node.Spec.Unschedulable 
        
           	return ready && stable && schedulable 
        
           }

hase1128 · 2024-10-20T13:06:23Z

If the GPU is not visible to the OS for some reason, it will not be seen as an error by the OS, so the K8s node should be considered normal.

ttsuuubasa added the kind/feature Categorizes issue or PR as related to a new feature. label Oct 11, 2024

k8s-ci-robot added the area/cluster-autoscaler label Oct 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add an option to create a node template from capacity annotations even if there are nodes inside a node group #7380

Add an option to create a node template from capacity annotations even if there are nodes inside a node group #7380

ttsuuubasa commented Oct 11, 2024

adrianmoisey commented Oct 11, 2024

x13n commented Oct 18, 2024

hase1128 commented Oct 20, 2024

Add an option to create a node template from capacity annotations even if there are nodes inside a node group #7380

Add an option to create a node template from capacity annotations even if there are nodes inside a node group #7380

Comments

ttsuuubasa commented Oct 11, 2024

adrianmoisey commented Oct 11, 2024

x13n commented Oct 18, 2024

hase1128 commented Oct 20, 2024