You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request designed to solve a problem? If so describe the problem this feature should solve.:
The problem is that if some resources of nodes are broken down and kubernetes can't see that,
Cluster Autoscaler can't make decision correctly to scale-up when the node is selected as NodeInfo.
This is against the assumption that all machines inside a node group have identical capacity.
For example,
if the nodes have 3 GPU but 1 GPU of a certain node is crashed and the node is selected as NodeInfo,
Cluster Autoscaler recognizes the nodes inside the node group have 2 GPU and doesn't scale-up to schedule pods requesting 3 GPU.
Describe the solution you'd like.:
Our solution makes Cluster Autoscaler look into capacity annotations in MachineSet
only when the particular annotation is set to MachineSet like "spec-fixed: enabled".
The capacity annotations instruct Cluster Autoscaler about size of nodes in a node group.
If MachineSet has no annotation, Cluster Autoscaler selects a node for NodeInfo as usual.
Describe any alternative solutions you've considered.: NA
Additional context.:
Current Implementation
When Cluster Autoscaler scale-up, it selects a node randomly from every node group and makes judgement whether it is possible to scale-up or not based on the node template.
If there is no node inside a node group, Cluster Autoscaler decides a scalability with capacity annotations, which specifies a machine specification in MachineSet in cluster-api.
What we would like to realize
Even if there are nodes inside a node group, Cluster Autoscaler creates a node template from the capacity annotations
when specifying a particular annotation into MachineSet.
We would like to discuss and study the feasibility of this function.
The text was updated successfully, but these errors were encountered:
If the node is only partially healthy, is there a way to tell that based on the k8s Node object? Some condition perhaps? If the answer is yes, we could instead just update this check to prevent Cluster Autoscaler from picking such nodes to be used as templates:
Which component are you using?:
cluster-autoscaler
Is your feature request designed to solve a problem? If so describe the problem this feature should solve.:
The problem is that if some resources of nodes are broken down and kubernetes can't see that,
Cluster Autoscaler can't make decision correctly to scale-up when the node is selected as NodeInfo.
This is against the assumption that all machines inside a node group have identical capacity.
For example,
if the nodes have 3 GPU but 1 GPU of a certain node is crashed and the node is selected as NodeInfo,
Cluster Autoscaler recognizes the nodes inside the node group have 2 GPU and doesn't scale-up to schedule pods requesting 3 GPU.
Describe the solution you'd like.:
Our solution makes Cluster Autoscaler look into capacity annotations in MachineSet
only when the particular annotation is set to MachineSet like "spec-fixed: enabled".
The capacity annotations instruct Cluster Autoscaler about size of nodes in a node group.
If MachineSet has no annotation, Cluster Autoscaler selects a node for NodeInfo as usual.
Describe any alternative solutions you've considered.: NA
Additional context.:
Current Implementation
What we would like to realize
Even if there are nodes inside a node group, Cluster Autoscaler creates a node template from the capacity annotations
when specifying a particular annotation into MachineSet.
We would like to discuss and study the feasibility of this function.
The text was updated successfully, but these errors were encountered: