Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

eksctl created cluster nodes regress to NotReady #1907

Closed
caprica opened this issue Mar 8, 2020 · 4 comments
Closed

eksctl created cluster nodes regress to NotReady #1907

caprica opened this issue Mar 8, 2020 · 4 comments
Labels
kind/help Request for help

Comments

@caprica
Copy link

caprica commented Mar 8, 2020

Apologies if this is not an eksctl issue and is instead related to my own environment, but after much fruitless research I am somewhat desperate.

I have just started using EKS and I setup a cluster with "eksctl create cluster" as per the README/guide - this is all seems to work, I just used the default options.

I run "kubectl get nodes" and it reports the nodes in my cluster as Ready.

I leave the cluster alone, I don't change any config nor do I deploy anything to it.

All is well until around 30 minutes later the nodes go to NotReady.

At the same time, in the instance system logs there are lots of new authorization failures being reported (I use "journalctl -u kubelet" to watch them).

So the kubelet process has seemingly lost its authorization to interact with the cluster after some timeout.

I have clearly missed something, but what?

@caprica caprica added the kind/help Request for help label Mar 8, 2020
@sayboras
Copy link
Contributor

sayboras commented Mar 9, 2020

Can you see details of such authorization failure (e.g. IAM role)? I got similar issue but it's due to kubernetes-sigs/aws-iam-authenticator#268, and I did update CM after the cluster creation.

@caprica
Copy link
Author

caprica commented Mar 9, 2020

I see lots of authorisation failures, in the CloudWatch logs I spotted a failure that says ARN is not mapped and it gives the AmazonSSMRoleForInstancesQuickSetup role.

That role is the one returned by get-caller-identity on the instance.

@caprica
Copy link
Author

caprica commented Mar 9, 2020

I'm running some extended tests but I think I have resolved it.

I was thrown off by everything working properly at first, so I never bothered to map a different IAM role in the aws-auth config.

So it seems when the cluster is created it somehow has an initial authorisation which is subsequently permanently lost.

There are indeed warnings in the documentation about the need for mapping a role, I guess I should have taken those warnings more seriously.

@caprica caprica closed this as completed Mar 9, 2020
@sayboras
Copy link
Contributor

sayboras commented Mar 9, 2020

It's fine, glad that you managed to solve it 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/help Request for help
Projects
None yet
Development

No branches or pull requests

2 participants