Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: Supportability of GPU of NVIDIA, AMD and Intel #7374

Open
ttsuuubasa opened this issue Oct 9, 2024 · 4 comments
Open

Question: Supportability of GPU of NVIDIA, AMD and Intel #7374

ttsuuubasa opened this issue Oct 9, 2024 · 4 comments

Comments

@ttsuuubasa
Copy link

Hello everyone,

I have some questions about a supportability of GPU some vendors provide.

Question1:
Which vendor's GPU is supported in Cluster Autoscaler?

GPU is provided by some vendor like NVIDIA, AMD and Intel.
I would like to know whether Cluster Autoscaler correctly recognizes these GPU and autoscales nodes or not.

My understanding is that GPU vendor is hardcoded to "nvidia.com/gpu" as the following.
Therefore, --gpu-total option only works when using NVIDIA GPU.
However, autoscaling itself may work if k8s scheduler plugins in PredicateChecker correctly simulate the packing of pod requesting GPU except NVIDIA.

Could you please tell me how Cluster Autoscaler works when we use GPU provided by various vendors?

func (p *GpuCustomResourcesProcessor) GetNodeGpuTarget(GPULabel string, node *apiv1.Node, nodeGroup cloudprovider.NodeGroup) (CustomResourceTarget, errors.AutoscalerError) {
gpuLabel, found := node.Labels[GPULabel]
if !found {
return CustomResourceTarget{}, nil
}
gpuAllocatable, found := node.Status.Allocatable[gpu.ResourceNvidiaGPU]

Question2:
Can the following annotation be used to scale from zero node in cluster-api with gpu-type except NVIDIA?

capacity.cluster-autoscaler.kubernetes.io/gpu-type: "nvidia.com/gpu"
capacity.cluster-autoscaler.kubernetes.io/gpu-count: "2"

Question3:
Does Cluster Autoscaler correctly scale-in nodes with GPU except NVIDIA?

Cluster Autoscaler scale-in nodes with GPU when usage of GPU is lower than a threshold by observing scheduled pods.
I would like to know the judgement is made correctly when using AMD/Intel GPU.

Question4:
Is Cluster Autoscaler based on device-plugin for handling nodes with GPU?

Recently, Dynamic Resource Allocation(DRA) is being implemented for GPU management.
Is my understanding correct that the implementation of Cluster Autoscaler with DRA is in progress and it does not work yet?

@adrianmoisey
Copy link
Member

/kind cluster-autoscaler

@k8s-ci-robot
Copy link
Contributor

@adrianmoisey: The label(s) kind/cluster-autoscaler cannot be applied, because the repository doesn't have them.

In response to this:

/kind cluster-autoscaler

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@adrianmoisey
Copy link
Member

/area cluster-autoscaler

@MaciekPytel
Copy link
Contributor

Which vendor's GPU is supported in Cluster Autoscaler?

GPU is provided by some vendor like NVIDIA, AMD and Intel.
I would like to know whether Cluster Autoscaler correctly recognizes these GPU and autoscales nodes or not.

Currently (pre-DRA) node resources are just a map[string]quantity. Neither scheduler nor CA really understands what a particular resource is, it's literally just comparing matching keys in two dictionaries (pod requests and node allocatable). So, in principle, CA works with any resource.

There are two tricky parts though:

  • If you have a node in a nodeGroup, CA will copy it and assume every new node will be identical. When scaling-from-0 CA needs to know how a new node will look like - including the allocatable of any resource, such as the GPUs you mention. How you pass it is different for each provider (e.g. in AWS you can specify it by setting tag on ASG).
  • Resources managed by device plugins generally only show in node allocatable after a daemonset installs drivers. This means there is a window when a new node is already ready as far as kubernetes status conditions go, but it doesn't advertise a GPU yet. From CA's perspective that node has no GPU and pods requesting GPU are still pending, so CA will trigger another scale-up not understanding that the first node is still initializing.
    • This can be solved by creating node with startup-taint (see our README) and removing the taint once the GPU is visible in allocatable.

The processor code you linked generally aims at solving the problems above specifically for nvidia.com/gpu. So those generally work out of the box with no extra setup required. Other GPUs should work fine, they just need a bit of extra setup described above.

Therefore, --gpu-total option only works when using NVIDIA GPU.

Cluster Autoscaler scale-in nodes with GPU when usage of GPU is lower than a threshold by observing scheduled pods.
I would like to know the judgement is made correctly when using AMD/Intel GPU.

Only resource (in the sense of key in the node allocatable) called "nvidia.com/gpu" is recognized as GPU for the purposes of resource limits and scale-down thresholds. Whether that key actually represents an Nvidia GPU is irrelevant for Cluster Autoscaler.

"Extended resources" (ie. any key in allocatable map that is not cpu, memory, nvidia gpu IIRC don't go through utilization threshold check. However, CA will only scale-down if all the pods running on node to be removed will be able to schedule on other nodes in the cluster. That check will take into account all scheduling constraints, including any extended resources.

Is Cluster Autoscaler based on device-plugin for handling nodes with GPU?

I'm not sure I understand the question. Autoscaling is based on scheduling simulation, which takes into account all key/value pairs in pod resource requests and node allocatable. Device-plugin is generally what sets the relevant allocatable value on node object, so in that narrow sense CA is based on device-plugin.

Recently, Dynamic Resource Allocation(DRA) is being implemented for GPU management.
Is my understanding correct that the implementation of Cluster Autoscaler with DRA is in progress and it does not work yet?

You're correct.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants