-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cluster domain "cluster.local" is hardcoded #15311
Comments
Hello @pisto 👋 First of all thank you for trying the operator and opening this issue! |
It is breaking the deployment because in the generated The cluster domain is generally configured directly in the dns resolution stack of k8s (coredns, cloud platform, ...), and the root issue here is that it is in general not discoverable. In OpenShift you have the DNS Operator and you can ( As I mentioned in my first message, the easiest solution is to avoid generating fqdn hostnames at all, and just truncate them up to the .svc part. As far as I know, there is no real requirement or advantage in using the fqdn, except for a marginally faster resolution (if you don't use an fqdn you rely on search domains, which means multiple dns queries are sent out until a result is found). Fortunately in my testing I managed to catch this blocker early in development, thanks to the fact that in our minikube development setup we use a custom domain ( |
Hey @pisto and thanks for the question. May I ask what the use-case for changing the cluster domain is? There have already been quite a few deployments of Loki managed with the Operator and we have so far not yet encountered this issue. Granted, most of them were based on OpenShift, customization might be more prevalent in the other flavors of Kubernetes. From my point of view the cluster domain is only for "cluster internal" usage and would not need to be customized for interaction with "outside DNS", so I'm interested in why your use-case requires customization of that domain. |
Non default cluster domain are used in multi-cluster deployments with inter-cluster connectivity, or in systems where the k8s DNS resolution system is integrated with a broader DNS zone. My specific use case is about GKE (GCP managed kubernetes): we have multiple cluster installed in the same, cross-project VPC. In this mode, each cluster must (hence the blocker) have a different cluster domain. This configuration allows pods in one cluster to use vpc-native connectivity to talk to other pods and services, and k8s DNS names for any cluster can be resolved seamlessly from any other cluster (or any other VPC workload actually). For more information, https://cloud.google.com/kubernetes-engine/docs/how-to/cloud-dns#vpc_scope_dns . Summing it up: I understand that most commonly, and apparently in all OpenShift environments the cluster domain is always cluster.local. However, it is perfectly legal per the kubernetes specs to have any cluster domain, and cluster.local is just an example value that happens to be very common. In my opinion the inability of setting the domain is a bug, and the resolution is a low hanging one. |
Sorry for being a bit slow to respond, needed to focus on a different topic for a few days. Thanks for providing that context, I think I now get how it is used and that it's an actual use-case. I have to admit that it has not occurred to me and I still think I would prefer to use the cluster-internal DNS only for cluster-internal communication, but I can also see the advantages of your approach. I had used GKE before, but that was before that feature existed 🙂 I think we'll have a look at it and if it's as easy as your suggestion makes it seam without breaking the existing deployments, then this looks like a probable change. 👍 And to give a bit of context myself: Most of us will have a short or longer break over the next two weeks, so I wouldn't expect many updates over that time. 🌴 |
After jumping through a number hoops, I have hit a blocker to the deployment of the operator. I believe the issue is common to all cluster, including OpenShift (I am testing on minikube).
It appears that the cluster domain is hardcoded here. It appears also that there is no way to override loki
config.yaml
values, which would solve this issue, and most likely many other potential issues due to the rigidity of the configuration generated by the operator.A solution would be to reference all services without the fqdn, but just by the short .svc name. I cannot foresee any issue with that, as I don't believe that the operator components access services external to the namespace where the stack is installed.
Additional places where
cluster.local
is hardcoded: handling of the certificates for the webhooks here and here. The webhooks are reached in my minikube installation with a custom cluster domain, which means that the apiserver is using and validating the short name, not the fqdn one, so it appears that the fqdn names can just be removed.The text was updated successfully, but these errors were encountered: