Cluster Autoscaler not working on the new AL2023 EKS optimised AMI #6963

ashishrajora0808 · 2024-06-23T11:49:07Z

Which component are you using?: registry.k8s.io/autoscaling/cluster-autoscaler:v1.29.0

cluster-autoscaler

What version of the component are you using?:

Component version: v1.29.0

What k8s version are you using (kubectl version)?: 1.30

kubectl version Output

$ kubectl version

What environment is this in?: AWS

What did you expect to happen?: On trialling the Amazon linux 2023 EKS optimised AMI, I just expected things to work as the worker nodes in EKS have all the desired permissions for the AS to communicate to the ASG.

What happened instead?:
I am getting errors on the startup of ASG which points to some sort of credentials or networking issue.

How to reproduce it (as minimally and precisely as possible):

Build the EKS cluster with AL2023 EKS optimised AMI amazon/amazon-eks-node-al2023-x86_64-standard-1.27-v20240615.
Try to install ASG version v1.29.0 via helm
The logs should indicate the below

│ I0621 15:22:13.971945 1 aws_manager.go:79] AWS SDK Version: 1.48.7 │
│ I0621 15:22:13.972068 1 auto_scaling_groups.go:396] Regenerating instance to ASG map for ASG names: [] │
│ I0621 15:22:13.972083 1 auto_scaling_groups.go:403] Regenerating instance to ASG map for ASG tags: map[k8s.io/cluster-autoscaler/enabled: k8s.io/clust │
│ er-autoscaler/qa-ore-blue:] │
│ E0621 15:24:14.262752 1 aws_manager.go:128] Failed to regenerate ASG cache: RequestError: send request failed │
│ caused by: Post "https://autoscaling.us-west-2.amazonaws.com/": dial tcp: lookup autoscaling.us-west-2.amazonaws.com: i/o timeout │
│ F0621 15:24:14.262782 1 aws_cloud_provider.go:460] Failed to create AWS Manager: RequestError: send request failed │
│ caused by: Post "https://autoscaling.us-west-2.amazonaws.com/": dial tcp: lookup autoscaling.us-west-2.amazonaws.com: i/o timeout

Anything else we need to know?: I updated the AWS VPC CNI plugin as part of the investigation but it did not help

amazon-k8s-cni-init:v1.18.2

The AS service account for the new EKS AL2023 AMI looks like it is not loading secrets. Not sure if this is the cause:

Name: cluster-autoscaler-aws-cluster-autoscaler │
│ Namespace: kube-system │
│ Labels: app.kubernetes.io/instance=cluster-autoscaler │
│ app.kubernetes.io/managed-by=Helm │
│ app.kubernetes.io/name=aws-cluster-autoscaler │
│ app.kubernetes.io/version=1.29.0 │
│ helm.sh/chart=cluster-autoscaler-9.35.0 │
│ Annotations: eks.amazonaws.com/role-arn: arn:aws:iam::123456789987:role/*-eks-worker-role-ore │
│ meta.helm.sh/release-name: cluster-autoscaler │
│ meta.helm.sh/release-namespace: kube-system │
│ Image pull secrets: │
│ Mountable secrets: │
│ Tokens: │
│ Events:

The text was updated successfully, but these errors were encountered:

ashishrajora0808 added the kind/bug Categorizes issue or PR as related to a bug. label Jun 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cluster Autoscaler not working on the new AL2023 EKS optimised AMI #6963

Cluster Autoscaler not working on the new AL2023 EKS optimised AMI #6963

ashishrajora0808 commented Jun 23, 2024 •

edited by gjtempleton

Loading

Cluster Autoscaler not working on the new AL2023 EKS optimised AMI #6963

Cluster Autoscaler not working on the new AL2023 EKS optimised AMI #6963

Comments

ashishrajora0808 commented Jun 23, 2024 • edited by gjtempleton Loading

ashishrajora0808 commented Jun 23, 2024 •

edited by gjtempleton

Loading