You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The kubelet https readonly port is set to 0 for GKE autopilot. This was done to fix this issue years ago, but not sure why it was necessary--it seems like it was to force the agent to fall back to the unsecured http port 10255? Google is deprecating the insecure readonly port now, and is emailing users to migrate to https on 10250.
I disabled the insecure port on our autopilot cluster, and it broke the agent's connection to kubelet, since the HTTPS port 0 obviously doesn't work and the fallback 10255 is gone too.
Nit: annoyingly the top-level datadog.env value doesn't propagate down--it's overridden by each container agent. I had to add it to agents.containers.agent.env, agents.containers.traceAgent.env, agents.containers.processAgent.env, etc.
Anyway, after doing this, I'm still getting failures:
2024-12-13 00:04:39 UTC | CORE | WARN | (comp/core/workloadmeta/impl/store.go:599 in func1) | error pulling from collector "kube_metadata": couldn't fetch "podlist": unexpected status code 403 on https://10.10.15.215:10250/pods: Forbidden (user=system:serviceaccount:ddagent:datadog-agent, verb=get, resource=nodes, subresource=proxy)
I thought this was an RBAC permissions issue with the agent's service account that's surfaced by enabling the https port instead of the open read-only http port:
Although nodes/proxy is supposed to be the correct resource name to grant access to the /pods endpoint, granting this permissions is disabled in GKE autopilot:
If your workload uses the /pods endpoint on the insecure kubelet read-only port, you need to grant the nodes/proxy RBAC permission to access the endpoint on the secure kubelet port. nodes/proxy is a powerful permission that you can't grant in GKE Autopilot clusters and that you shouldn't grant in GKE Standard clusters. Use the Kubernetes API with a fieldSelector for the node name instead.
Ah, so that's why the HTTPS port was bypassed for GKE autopilot in the first place.
The core issue here may be on the agent, but think it's worth raising here too since the workaround (setting DD_KUBERNETES_HTTPS_KUBELET_PORT=0) will soon no longer be supported by GKE. And if the agent does get updated to use the Kubernetes API instead of kubelet, the RBAC roles will still have to be updated here.
The text was updated successfully, but these errors were encountered:
Hello @tkoft , thank you for raising this issue. We are aware of this incoming depreciation and are working with Google and other Datadog engineering teams towards a solution as indeed, nodes/proxy cannot be used preventing our usage of the HTTPS kubelet port. Until such work is completed, to keep full Agent functionality, the insecure port should remain enabled for GKE Autopilot clusters.
The kubelet https readonly port is set to 0 for GKE autopilot. This was done to fix this issue years ago, but not sure why it was necessary--it seems like it was to force the agent to fall back to the unsecured http port 10255? Google is deprecating the insecure readonly port now, and is emailing users to migrate to https on 10250.
I disabled the insecure port on our autopilot cluster, and it broke the agent's connection to kubelet, since the HTTPS port 0 obviously doesn't work and the fallback 10255 is gone too.
I then went to manually set
in my datadog-values.yaml.
Nit: annoyingly the top-level
datadog.env
value doesn't propagate down--it's overridden by each container agent. I had to add it toagents.containers.agent.env
,agents.containers.traceAgent.env
,agents.containers.processAgent.env
, etc.Anyway, after doing this, I'm still getting failures:
I thought this was an RBAC permissions issue with the agent's service account that's surfaced by enabling the https port instead of the open read-only http port:
Although
nodes/proxy
is supposed to be the correct resource name to grant access to the/pods
endpoint, granting this permissions is disabled in GKE autopilot:Ah, so that's why the HTTPS port was bypassed for GKE autopilot in the first place.
The core issue here may be on the agent, but think it's worth raising here too since the workaround (setting
DD_KUBERNETES_HTTPS_KUBELET_PORT=0
) will soon no longer be supported by GKE. And if the agent does get updated to use the Kubernetes API instead of kubelet, the RBAC roles will still have to be updated here.The text was updated successfully, but these errors were encountered: