Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

💡How to access ETCD from mgmt cluster? #75

Open
mogliang opened this issue Nov 22, 2023 · 10 comments
Open

💡How to access ETCD from mgmt cluster? #75

mogliang opened this issue Nov 22, 2023 · 10 comments

Comments

@mogliang
Copy link
Collaborator

mogliang commented Nov 22, 2023

k3s etcd is not a static pod but inside k3s host process. Which means we cannot use kubernetes port-forward to access k3s endpoint. I want to start this thread to discuss how we can access k3s etcd endpoint from capi mgmt cluster.

Below are some ideas in my mind:

  1. Access etcd endpoint directly by vm address.

This is the most straightforward way. capi machine has MachineAddresses field which expose vm address info, mgmt cluster can start a connection to machine address on 2379 port to access etcd client endpoints.

The big drawback could be: it has more network requirement, it requires network be routable between mgmt cluster and target cluster vm. Besides, it could expose more attack surface, that means users may take more effort to ensure the system security.

  1. Daemonset in target cluster cp nodes to forward etcd traffic.

We deploy daemonset to target cluster's cp nodes, it does one simple job, just forward 2379 traffic to its host's 2379 port. So, mgmt cluster can still use kubernetes port-forward to access daemonset pod, and forward again to host etcd endpoint. Theoretically, this shall work, but I haven't tested yet.

The drawback is also very obvious, it need deploy pods on target cluster, which is not very user-friendly.

Although k3s cp provider can still work without access etcd, but it's not safe, we need fix this issue to make project production ready. Please comment below to share your idea⚡

@mogliang
Copy link
Collaborator Author

cc @zawachte

@mogliang
Copy link
Collaborator Author

The doc says we can use apiserver as proxy to access node!? Any one has more details on this?
https://kubernetes.io/docs/concepts/cluster-administration/proxies/

The apiserver proxy:

is a bastion built into the apiserver
connects a user outside of the cluster to cluster IPs which otherwise might not be reachable
runs in the apiserver processes
client to proxy uses HTTPS (or http if apiserver so configured)
proxy to target may use HTTP or HTTPS as chosen by proxy using available information
can be used to reach a Node, Pod, or Service
does load balancing when used to reach a Service

@lukebond
Copy link
Contributor

lukebond commented Nov 23, 2023

i'm not familiar with the apiserver proxy, so can't shed any light on it. docs are quite thin, they mention it can be used to access nodes but don't show you how. kubectl cluster-info seems to list only the services in the kube-system namespace on one of our clusters here that i'm looking at. i think using services isn't useful without adding a daemonset like you propose. would be interested to know how the node access works if anyone knows. going in via the k8s API is attractive as it doesn't require extra networking configuration. for option 1 to work for us we'd need to add 2379 to the CP security group. easy enough to do, but an added burden for users of MHC that might not be obvious.

@zawachte
Copy link
Collaborator

zawachte commented Nov 24, 2023

I think we should be able to manually create a endpoint resource and service resource for etcd which should allow us to do the same tunneling (port-forward) that kubeadm provider does.

This is similar to how prom-operator sets up prom to scrape kubelet/cadvisor metrics.

@lukebond
Copy link
Contributor

ah so something like this?

  • create an Endpoints resource, with a subset entry for each CP host, and a port entry for 2379, defined in the kube-system namespace
  • use the api proxy linked above (here) to access it

given you can point endpoints subsets to nodes, this might be what the docs meant when they say you can use api proxies to communicate with nodes. well you can, indirectly, via a service, if you create one by hand like that. seems a bit unclear of the docs if that's what they meant!

@mogliang
Copy link
Collaborator Author

mogliang commented Nov 28, 2023

tested the solution (service+endpoints) zawachte provided on my k3s enviornment. yaml are like below

apiVersion: v1
kind: Endpoints
metadata:
  name: etcd-service-host1
  namespace: default
subsets:
- addresses:
  - ip: [host ip here]
  ports:
  - name: https
    port: 2380
    protocol: TCP
---
apiVersion: v1
kind: Service
metadata:
  name: etcd-service-host1
  namespace: default
spec:
  ports:
  - name: https
    port: 2380
    protocol: TCP
    targetPort: 2380
  type: ClusterIP

Yes, pod was able to access host etcd by accessing the service clusterIP, However, this service is not allowed for port-forward. port foward will give err below:
error: cannot attach to *v1.Service: invalid service 'etcd-service': Service is defined without a selector

searched a bit, and someone gave an explanation and solution https://stackoverflow.com/questions/56870808/access-external-database-resource-wrapped-in-service-without-selector.

In short: port forward service need the pod to back, and still true now. The solution so far, is create a pod to act as proxy.

@mogliang
Copy link
Collaborator Author

tested deploying pod as etcd proxy and use port-forwarding to connect, works! the proxy yaml are as below

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: etcd-proxy
  namespace: kube-system
  labels:
    app: etcd-proxy
spec:
  selector:
    matchLabels:
      name: etcd-proxy
  template:
    metadata:
      labels:
        name: etcd-proxy
    spec:
      nodeSelector:
        node-role.kubernetes.io/etcd: "true"
      tolerations:
      - key: node-role.kubernetes.io/control-plane
        operator: Exists
        effect: NoSchedule
      - key: node-role.kubernetes.io/master
        operator: Exists
        effect: NoSchedule
      containers:
      - name: etcd-proxy
        image: alpine/socat
        env:
        - name: HOSTIP
          valueFrom:
            fieldRef:
              fieldPath: status.hostIP
        args: 
        - TCP4-LISTEN:2380,fork,reuseaddr
        - TCP4:$(HOSTIP):2380
        resources:
          limits:
            memory: 200Mi
          requests:
            cpu: 100m
            memory: 200Mi
        volumeMounts:
        - name: varlog
          mountPath: /var/log
      terminationGracePeriodSeconds: 30
      volumes:
      - name: varlog
        hostPath:
          path: /var/log

We can let k3s bootstrap provider to install this daemonset.

Guys, what do you think? if looks good, i will proceed to work on a pr.

@zawachte
Copy link
Collaborator

zawachte commented Nov 30, 2023

I am good to move forward with this approach for now. I think this should be the job of the control-plane provider, not bootstrap.

Currently, we only support etcd+k3s today, but we need to ensure that we add this in a way where we have an etcd mode that installs this daemonset and manages etcd but also leaves the door open for us to develop modes that:

  • are a no-op for external dbs.
  • do sqllite management.
  • actually manage the external dbs. (this is probably out of scope for this project)

@mogliang
Copy link
Collaborator Author

mogliang commented Dec 5, 2023

There are several things need to be done. I'll try to break into several pr, so that it can be easier for review and test.

  1. Generate and set etcd ca certs in bootstrap controller when using embeded etcd. We'll need the ca for auth when connecting etcd later. https://github.com/cluster-api-provider-k3s/cluster-api-k3s/pull/78/commits
  2. Intall etcd proxy daemonset when using embeded etcd.
  3. Copy proxy and etcd internal packages from capi, implement etcd connect logic.
  4. Copy and fix etcd operations method from capi, integrate with upgrade & remediation logic.

@mogliang
Copy link
Collaborator Author

mogliang commented Apr 2, 2024

Update:
We recently discussed with k3s guys, and it seems that we have another option, i drafted a proposal here, please help review and comment. Thanks !

#97

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants