kubernetes 1.30.5 support #23230

karatkep · 2024-11-04T14:43:12Z

Summary

Dear Community,

Could you please help me verify if Eclipse Che 7.93.0 supports Kubernetes 1.30.5? The che-dashboard and che pods stopped working when our Kubernetes cluster was updated to version 1.30.5.

Here is a sample of the error in the che-dashboard:

ERROR[12:03:22 UTC]: [HTTP request failed[
    err: {
      "type": "le",
      "message": "HTTP request failed",
      "stack":
          HttpError: HTTP request failed
              at q._callback (/backend/server/backend.js:8:898957)
              at t._callback.t.callback.t.callback (/backend/server/backend.js:14:1087840)
              at q.emit (node:events:517:28)
              at q.<anonymous> (/backend/server/backend.js:14:1100418)
              at q.emit (node:events:517:28)
              at IncomingMessage.<anonymous> (/backend/server/backend.js:14:1099250)
              at Object.onceWrapper (node:events:631:28)
              at IncomingMessage.emit (node:events:529:35)
              at endReadableNT (node:internal/streams/readable:1400:12)
              at process.processTicksAndRejections (node:internal/process/task_queues:82:21)
      "response": {
        "statusCode": 401,
        "body": {
          "kind": "Status",
          "apiVersion": "v1",
          "metadata": {},
          "status": "Failure",
          "message": "Unauthorized",
          "reason": "Unauthorized",
          "code": 401
        },
        "headers": {
          "audit-id": "6b14e1b5-8a08-41a8-a093-5e00693737a6",
          "cache-control": "no-cache, private",
          "content-type": "application/json",
          "date": "Mon, 04 Nov 2024 12:03:21 GMT",
          "content-length": "129",
          "connection": "close"
        },
        "request": {
          "uri": {
            "protocol": "https:",
            "slashes": true,
            "auth": null,
            "host": "10.1.0.1:443",
            "port": "443",
            "hostname": "10.1.0.1",
            "hash": null,
            "search": null,
            "query": null,
            "pathname": "/apis/org.eclipse.che/v2/checlusters",
            "path": "/apis/org.eclipse.che/v2/checlusters",
            "href": "https://10.1.0.1:443/apis/org.eclipse.che/v2/checlusters"
          },
          "method": "GET",
          "headers": {
            "Accept": "application/json",
            "Authorization": "Bearer MASKED"
          }
        }
      },
      "body": {
        "type": "Object",
        "message": "Unauthorized",
        "stack":
            
        "kind": "Status",
        "apiVersion": "v1",
        "metadata": {},
        "status": "Failure",
        "reason": "Unauthorized",
        "code": 401
      },
      "statusCode": 401,
      "name": "HttpError"
    }

The same issue affects the che pod. It appears that both lost access to the Kubernetes API after the upgrade to version 1.30.5.

ServiceAccounts, Cluster Roles and Bindings are in place for both che-dashboard and che pods

Relevant information

No response

The text was updated successfully, but these errors were encountered:

tolusha · 2024-11-04T16:46:13Z

@karatkep
Could you show che pod logs?

I've tried to reproduce on Minikube with Kubernetes 1.31.0, but no luck

karatkep · 2024-11-06T10:14:19Z

@tolusha
According to the che logs, the che pod starts receiving 401 errors from the kube-api exactly one hour after the pod starts working/launches:

06-Nov-2024 08:26:02.136 INFO [main] org.apache.catalina.startup.HostConfig.deployWAR Deployment of web application archive [/home/user/eclipse-che/tomcat/webapps/ROOT.war] has finished in [2,488] ms
06-Nov-2024 08:26:02.138 INFO [main] org.apache.coyote.AbstractProtocol.start Starting ProtocolHandler ["http-nio-8080"]
06-Nov-2024 08:26:02.144 INFO [main] org.apache.catalina.startup.Catalina.start Server startup in [40907] milliseconds
2024-11-06 09:26:32,950[c4d-k5x9l-37628]  [WARN ] [o.j.p.kubernetes.KUBE_PING 115]      - failed getting JSON response from Kubernetes Client[masterUrl=https://10.1.0.1:443/api/v1, headers={Authorization=#MASKED:1868#}, connectTimeout=5000, readTimeout=30000, operationAttempts=3, operationSleep=1000, streamProvider=org.jgroups.protocols.kubernetes.stream.TokenStreamProvider@6c199c1d] for cluster [RemoteSubscriptionChannel], namespace [eclipse-che], labels [app.kubernetes.io/component=che,app.kubernetes.io/instance=che,app.kubernetes.io/managed-by=che-operator,app.kubernetes.io/name=che,app.kubernetes.io/part-of=che.eclipse.org]; encountered [java.lang.Exception: 3 attempt(s) with a 1000ms sleep to execute [OpenStream] failed. Last failure was [java.io.IOException: Server returned HTTP response code: 401 for URL: https://10.1.0.1:443/api/v1/namespaces/eclipse-che/pods?labelSelector=app.kubernetes.io%2Fcomponent%3Dche%2Capp.kubernetes.io%2Finstance%3Dche%2Capp.kubernetes.io%2Fmanaged-by%3Dche-operator%2Capp.kubernetes.io%2Fname%3Dche%2Capp.kubernetes.io%2Fpart-of%3Dche.eclipse.org]]
2024-11-06 09:26:42,473[4c4d-k5x9l-3460]  [WARN ] [o.j.p.kubernetes.KUBE_PING 115]      - failed getting JSON response from Kubernetes Client[masterUrl=https://10.1.0.1:443/api/v1, headers={Authorization=#MASKED:1868#}, connectTimeout=5000, readTimeout=30000, operationAttempts=3, operationSleep=1000, streamProvider=org.jgroups.protocols.kubernetes.stream.TokenStreamProvider@f31944b] for cluster [WorkspaceStateCache], namespace [eclipse-che], labels [app.kubernetes.io/component=che,app.kubernetes.io/instance=che,app.kubernetes.io/managed-by=che-operator,app.kubernetes.io/name=che,app.kubernetes.io/part-of=che.eclipse.org]; encountered [java.lang.Exception: 3 attempt(s) with a 1000ms sleep to execute [OpenStream] failed. Last failure was [java.io.IOException: Server returned HTTP response code: 401 for URL: https://10.1.0.1:443/api/v1/namespaces/eclipse-che/pods?labelSelector=app.kubernetes.io%2Fcomponent%3Dche%2Capp.kubernetes.io%2Finstance%3Dche%2Capp.kubernetes.io%2Fmanaged-by%3Dche-operator%2Capp.kubernetes.io%2Fname%3Dche%2Capp.kubernetes.io%2Fpart-of%3Dche.eclipse.org]]
2024-11-06 09:26:47,468[c4d-k5x9l-46003]  [WARN ] [o.j.p.kubernetes.KUBE_PING 115]      - failed getting JSON response from Kubernetes Client[masterUrl=https://10.1.0.1:443/api/v1, headers={Authorization=#MASKED:1868#}, connectTimeout=5000, readTimeout=30000, operationAttempts=3, operationSleep=1000, streamProvider=org.jgroups.protocols.kubernetes.stream.TokenStreamProvider@5ed91d32] for cluster [WorkspaceLocks], namespace [eclipse-che], labels [app.kubernetes.io/component=che,app.kubernetes.io/instance=che,app.kubernetes.io/managed-by=che-operator,app.kubernetes.io/name=che,app.kubernetes.io/part-of=che.eclipse.org]; encountered [java.lang.Exception: 3 attempt(s) with a 1000ms sleep to execute [OpenStream] failed. Last failure was [java.io.IOException: Server returned HTTP response code: 401 for URL: https://10.1.0.1:443/api/v1/namespaces/eclipse-che/pods?labelSelector=app.kubernetes.io%2Fcomponent%3Dche%2Capp.kubernetes.io%2Finstance%3Dche%2Capp.kubernetes.io%2Fmanaged-by%3Dche-operator%2Capp.kubernetes.io%2Fname%3Dche%2Capp.kubernetes.io%2Fpart-of%3Dche.eclipse.org]]

karatkep · 2024-11-11T16:38:59Z

@tolusha, as I can see, the issue is that the token is not being refreshed. It is generated for 1 hour, and after that time, the che-dashboard continues to use it despite its expiration. Is there any way to prompt the che-dashboard to refresh it before using it for kube-api calls?

tolusha · 2024-11-12T13:59:27Z

@karatkep
Could you share CheCluster CR?
What OIDC provider do you use?

karatkep · 2024-11-12T18:44:22Z

@tolusha,
Yes, of course, I will provide the CheCluster CR. However, I don't think that the issue lies with the CheCluster CR or OIDC. The same version of Eclipse Che 7.93.0 was deployed in two identical AKS clusters (Kubernetes version 1.27.9), and everything was fine until one of the clusters was upgraded to 1.30.5. Immediately after this update, problems with the kube-api started. Reviewing the token used, for example, by the che-dashboard, I see that the expiration field "exp" is always the same and is in the past. From this, I conclude that for Kubernetes version 1.30.5, the token is not being updated.

karatkep · 2024-11-12T22:06:01Z

@tolusha , @ibuziuk , We found the root cause of the issue. In Kubernetes 1.27.9, the token (located at the path /var/run/secrets/kubernetes.io/serviceaccount/token) is issued for one year, although it is refreshed every hour (or more precisely every 50 minutes). At the same time, in Kubernetes 1.30.5, the token is issued for one hour and is also refreshed every 50 minutes. However, Che (che-dashboard, che, and most likely che-gateway) caches this token at startup and uses it. Consequently, in Kubernetes 1.27.9 there is no problem since the token is issued for one year, but in Kubernetes 1.30.5, the problem begins after the first hour from startup because the cached token is used.

tolusha · 2024-11-13T08:32:58Z

@karatkep
So, if you restart all pods, Che will continue working, right?

karatkep · 2024-11-13T09:13:11Z

@tolusha
Correct, we need to restart the Che pods every hour to ensure they remain operational.

karatkep · 2024-11-15T09:30:27Z

@tolusha, @ibuziuk,
Could you please share information and plans regarding this issue? Is everything clear and understandable? Were you able to reproduce it? Are you currently working on a resolution, or do you have plans to start working on it soon?

Just to be on the same page - there is absolutely no pressure from my side. I just want to understand the current status and plans regarding this issue. On my part, I have already used one of the possible workarounds and written a CronJob that restarts the necessary Che pods. If other Eclipse Che users are facing or will face the same issue, I am more than willing to share this workaround.

ibuziuk · 2024-11-15T10:09:47Z

@karatkep Thank you for the follow-up and investigation details - #23230 (comment)

I'm still wondering if the token lifetime is configurable on the k8s end in general?
Do you happen to have the link to the Release Notes, docs, or commit where this change with the lifetime was introduced? Could it be some AKS config?

The issue has been planned for the next sprint (Nov 20 - Dec 10), however, so far @tolusha was not able to reproduce it on vanilla minikube.

@karatkep also contributions from the Community are most welcome if you would like to change or update the caching mechanism in the project ;-)

karatkep · 2024-11-15T10:48:07Z

@ibuziuk,
When I was researching this issue, I came across the documentation at https://kubernetes.io/docs/reference/access-authn-authz/service-accounts-admin/#tokenrequest-api which contains detailed information about configuring token lifetime. Moreover, I conducted an experiment where I disabled the che-operator (so it wouldn’t interfere with making changes) and used the expirationSeconds to modify the lifetime of the token. I tried setting it to one day or 86400 seconds for the che-dashboard in the deployment. After restarting the che-dashboard pod, I confirmed that the lifetime of the token (located in /var/run/secrets/kubernetes.io/serviceaccount/token) had indeed changed.

P.S. But frankly speaking, I do not like the option of using a long-lived token - it contradicts security best practices. It seems to me that whoever made this change (token lifetime: 1y -> 1h), it is a step in the right direction to use short-lived tokens. And in my opinion, a well-written application should not cache the token indefinitely.

karatkep added the kind/question Questions that haven't been identified as being feature requests or bugs. label Nov 4, 2024

che-bot added the status/need-triage An issue that needs to be prioritized by the curator responsible for the triage. See https://github. label Nov 4, 2024

ibuziuk added this to Eclipse Che Team A Backlog Nov 5, 2024

ibuziuk moved this to 📅 Planned in Eclipse Che Team A Backlog Nov 15, 2024

ibuziuk added this to Red Hat OpenShift Dev Spaces and WebTerminal Priorities Nov 15, 2024

ibuziuk moved this to Todo in Red Hat OpenShift Dev Spaces and WebTerminal Priorities Nov 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kubernetes 1.30.5 support #23230

kubernetes 1.30.5 support #23230

karatkep commented Nov 4, 2024

tolusha commented Nov 4, 2024

karatkep commented Nov 6, 2024

karatkep commented Nov 11, 2024

tolusha commented Nov 12, 2024

karatkep commented Nov 12, 2024

karatkep commented Nov 12, 2024 •

edited

Loading

tolusha commented Nov 13, 2024

karatkep commented Nov 13, 2024

karatkep commented Nov 15, 2024

ibuziuk commented Nov 15, 2024

karatkep commented Nov 15, 2024

kubernetes 1.30.5 support #23230

kubernetes 1.30.5 support #23230

Comments

karatkep commented Nov 4, 2024

Summary

Relevant information

tolusha commented Nov 4, 2024

karatkep commented Nov 6, 2024

karatkep commented Nov 11, 2024

tolusha commented Nov 12, 2024

karatkep commented Nov 12, 2024

karatkep commented Nov 12, 2024 • edited Loading

tolusha commented Nov 13, 2024

karatkep commented Nov 13, 2024

karatkep commented Nov 15, 2024

ibuziuk commented Nov 15, 2024

karatkep commented Nov 15, 2024

karatkep commented Nov 12, 2024 •

edited

Loading