Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cluster Autoscaler not working on the new AL2023 EKS optimised AMI #6963

Open
ashishrajora0808 opened this issue Jun 23, 2024 · 0 comments
Open
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@ashishrajora0808
Copy link

ashishrajora0808 commented Jun 23, 2024

Which component are you using?: registry.k8s.io/autoscaling/cluster-autoscaler:v1.29.0

cluster-autoscaler

What version of the component are you using?:

Component version: v1.29.0

What k8s version are you using (kubectl version)?: 1.30

kubectl version Output
$ kubectl version

What environment is this in?: AWS

What did you expect to happen?: On trialling the Amazon linux 2023 EKS optimised AMI, I just expected things to work as the worker nodes in EKS have all the desired permissions for the AS to communicate to the ASG.

What happened instead?:
I am getting errors on the startup of ASG which points to some sort of credentials or networking issue.

How to reproduce it (as minimally and precisely as possible):

  • Build the EKS cluster with AL2023 EKS optimised AMI amazon/amazon-eks-node-al2023-x86_64-standard-1.27-v20240615.
  • Try to install ASG version v1.29.0 via helm
  • The logs should indicate the below

│ I0621 15:22:13.971945 1 aws_manager.go:79] AWS SDK Version: 1.48.7 │
│ I0621 15:22:13.972068 1 auto_scaling_groups.go:396] Regenerating instance to ASG map for ASG names: [] │
│ I0621 15:22:13.972083 1 auto_scaling_groups.go:403] Regenerating instance to ASG map for ASG tags: map[k8s.io/cluster-autoscaler/enabled: k8s.io/clust
│ er-autoscaler/qa-ore-blue:] │
│ E0621 15:24:14.262752 1 aws_manager.go:128] Failed to regenerate ASG cache: RequestError: send request failed │
│ caused by: Post "https://autoscaling.us-west-2.amazonaws.com/": dial tcp: lookup autoscaling.us-west-2.amazonaws.com: i/o timeout │
│ F0621 15:24:14.262782 1 aws_cloud_provider.go:460] Failed to create AWS Manager: RequestError: send request failed │
│ caused by: Post "https://autoscaling.us-west-2.amazonaws.com/": dial tcp: lookup autoscaling.us-west-2.amazonaws.com: i/o timeout

Anything else we need to know?: I updated the AWS VPC CNI plugin as part of the investigation but it did not help

amazon-k8s-cni-init:v1.18.2

The AS service account for the new EKS AL2023 AMI looks like it is not loading secrets. Not sure if this is the cause:

Name: cluster-autoscaler-aws-cluster-autoscaler │
│ Namespace: kube-system │
│ Labels: app.kubernetes.io/instance=cluster-autoscaler │
│ app.kubernetes.io/managed-by=Helm │
│ app.kubernetes.io/name=aws-cluster-autoscaler │
│ app.kubernetes.io/version=1.29.0 │
│ helm.sh/chart=cluster-autoscaler-9.35.0 │
│ Annotations: eks.amazonaws.com/role-arn: arn:aws:iam::123456789987:role/*-eks-worker-role-ore │
│ meta.helm.sh/release-name: cluster-autoscaler │
│ meta.helm.sh/release-namespace: kube-system │
│ Image pull secrets: │
│ Mountable secrets: │
│ Tokens: │
│ Events:

@ashishrajora0808 ashishrajora0808 added the kind/bug Categorizes issue or PR as related to a bug. label Jun 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

1 participant