You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We're using Kueue/DWS to schedule workloads in nodes. The node auto-provisioner is supposed to create a new node pool dynamically and schedule the pod on it. This works very well until we're attaching a generic ephemeral volume to the pod. In such a case we expect the ephemeral volume controller to create the PVC and the cluster-autoscaler to schedule the Pod.
What happened instead?:
A PVC gets created and stays pending with status message waiting for pod ... to be scheduled. Here's the output of kubectl describe pvc/fpait3jw44nxh5-n7-0-n3-0-n2-0-n2-n1-2-primary-tmp
Name: fpait3jw44nxh5-n7-0-n3-0-n2-0-n2-n1-2-primary-tmp
Namespace: testing
StorageClass: ephemeral-storage-sc
Status: Pending
Volume:
Labels: <none>
Annotations: <none>
Finalizers: [kubernetes.io/pvc-protection]
Capacity:
Access Modes:
VolumeMode: Filesystem
Used By: fpait3jw44nxh5-n7-0-n3-0-n2-0-n2-n1-2
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal WaitForPodScheduled 2m3s (x26 over 8m17s) persistentvolume-controller waiting for pod fpait3jw44nxh5-n7-0-n3-0-n2-0-n2-n1-2 to be scheduled
A ProvisioningRequest resource is also created but fails with the following status:
status:
conditions:
- lastTransitionTime: "2024-10-02T11:15:47Z"message: 'Provisioning Request''s pods cannot be scheduled in the nodepool. Predicate checking errors: dws-a100-f750 (waiting for ephemeral volume controller to create the persistentvolumeclaim "pod-fpait3jw44nxh5-n7-0-n3-0-n2-0-n2-n1-2-a40d4-dws-prov-2-0-0-primary-tmp"), dws-l4-17c5 (waiting for ephemeral volume controller to create the persistentvolumeclaim "pod-fpait3jw44nxh5-n7-0-n3-0-n2-0-n2-n1-2-a40d4-dws-prov-2-0-0-primary-tmp"), nap-g2-standard-32-gpu1-1jh5m4vj (waiting for ephemeral volume controller to create the persistentvolumeclaim "pod-fpait3jw44nxh5-n7-0-n3-0-n2-0-n2-n1-2-a40d4-dws-prov-2-0-0-primary-tmp")'observedGeneration: 1reason: ProvisioningRequestNotSchedulableInNodepoolstatus: "False"type: Accepted
- lastTransitionTime: "2024-10-02T11:17:46Z"message: 'Provisioning Request''s pods cannot be scheduled in the nodepool. Predicate checking errors: dws-a100-f750 (waiting for ephemeral volume controller to create the persistentvolumeclaim "pod-fpait3jw44nxh5-n7-0-n3-0-n2-0-n2-n1-2-a40d4-dws-prov-2-0-0-primary-tmp"), dws-l4-17c5 (waiting for ephemeral volume controller to create the persistentvolumeclaim "pod-fpait3jw44nxh5-n7-0-n3-0-n2-0-n2-n1-2-a40d4-dws-prov-2-0-0-primary-tmp"), nap-g2-standard-32-gpu1-1jh5m4vj (waiting for ephemeral volume controller to create the persistentvolumeclaim "pod-fpait3jw44nxh5-n7-0-n3-0-n2-0-n2-n1-2-a40d4-dws-prov-2-0-0-primary-tmp")'observedGeneration: 1reason: ProvisioningRequestNotSchedulableInNodepoolstatus: "True"type: Failed
⚠️ What is odd: The name of the PVC resource is not what's printed in the status message. Apparently the provisioning request is waiting for a PVC called pod-fpait3jw44nxh5-n7-0-n3-0-n2-0-n2-n1-2-a40d4-dws-prov-2-0-0-primary-tmp while the created PVC is called fpait3jw44nxh5-n7-0-n3-0-n2-0-n2-n1-2-primary-tmp
How to reproduce it (as minimally and precisely as possible):
As I said, we're using Kueue with GKE and attaching a generic ephemeral volume to the pod. Here's the relevant part of the Pod manifest:
Which component are you using?: cluster-autoscaler
What version of the component are you using?: cluster-autoscaler
Component version:
What k8s version are you using (
kubectl version
)?:kubectl version
OutputWhat environment is this in?:
GKE. We're using node auto-provisioning
What did you expect to happen?:
We're using Kueue/DWS to schedule workloads in nodes. The node auto-provisioner is supposed to create a new node pool dynamically and schedule the pod on it. This works very well until we're attaching a generic ephemeral volume to the pod. In such a case we expect the
ephemeral volume controller
to create the PVC and thecluster-autoscaler
to schedule the Pod.What happened instead?:
A PVC gets created and stays pending with status message
waiting for pod ... to be scheduled
. Here's the output ofkubectl describe pvc/fpait3jw44nxh5-n7-0-n3-0-n2-0-n2-n1-2-primary-tmp
A
ProvisioningRequest
resource is also created but fails with the following status:pod-fpait3jw44nxh5-n7-0-n3-0-n2-0-n2-n1-2-a40d4-dws-prov-2-0-0-primary-tmp
while the created PVC is calledfpait3jw44nxh5-n7-0-n3-0-n2-0-n2-n1-2-primary-tmp
How to reproduce it (as minimally and precisely as possible):
As I said, we're using Kueue with GKE and attaching a generic ephemeral volume to the pod. Here's the relevant part of the Pod manifest:
Anything else we need to know?:
The text was updated successfully, but these errors were encountered: