Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

node deletion fail cause the subsequent node operation fail #73

Open
mogliang opened this issue Nov 21, 2023 · 1 comment
Open

node deletion fail cause the subsequent node operation fail #73

mogliang opened this issue Nov 21, 2023 · 1 comment

Comments

@mogliang
Copy link
Collaborator

In a rolling update kcp process, when deleting node timedout, capi print err log and proceed to delete machine.

E1121 02:20:51.361652       1 machine_controller.go:461] "Timed out deleting node" err="error deleting node mc2-control-plane-wvw2h: Delete \"https://mc2.mc2.akshybrid.io:6443/api/v1/nodes/mc2-control-plane-wvw2h?timeout=10s\": context deadline exceeded" controller="machine" controllerGroup="cluster.x-k8s.io" controllerKind="Machine" Machine="cluster-mc2/mc2-control-plane-mv7zc" namespace="cluster-mc2" name="mc2-control-plane-mv7zc" reconcileID=b941098f-e4aa-463c-b390-7d15138a0a03 KThreesControlPlane="cluster-mc2/mc2-control-plane" Cluster="cluster-mc2/mc2" Node="mc2-control-plane-wvw2h"

However, since cluster is in unhealthy state, the following add node operation get blocked.

Nov 21 02:49:37 mc2-control-plane-h9kvp k3s[5875]: time="2023-11-21T02:49:37Z" level=info msg="Waiting to retrieve kube-proxy configuration; server is not ready: https://127.0.0.1:6443/v1-k3s/readyz: 500 Internal Server Error"
Nov 21 02:49:37 mc2-control-plane-h9kvp k3s[5875]: time="2023-11-21T02:49:37Z" level=info msg="Adding member mc2-control-plane-h9kvp-b513763a=https://192.168.0.114:2380 to etcd cluster [mc2-control-plane-psph2-00576310=https://192.168.0.111:2380 mc2-control-plane-w>
Nov 21 02:49:37 mc2-control-plane-h9kvp k3s[5875]: {"level":"warn","ts":"2023-11-21T02:49:37.542Z","logger":"etcd-client","caller":"[email protected]/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc00064e380/192.168.0.111:2>
Nov 21 02:49:37 mc2-control-plane-h9kvp k3s[5875]: time="2023-11-21T02:49:37Z" level=info msg="Waiting for other members to finish joining etcd cluster: etcdserver: unhealthy cluster"

To fix the issue, k3s cp provider shall be able to pass machineSpec nodeDeletionTimeout property.

@mogliang
Copy link
Collaborator Author

link to #62

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant