High CPU and Memory Load on single node k3s cluster #5769
Replies: 40 comments 9 replies
-
I'm not able to reproduce this on a local KVM server with the same specs - Ubuntu 20.04 x86_64, 4 cores / 8GB RAM. I do see that the CPU and memory utilization increase and use ~2 cores while Rancher is installing, but it settles down once all the pods are running. The only difference is that I haven't used LetsEncrypt since my dev node is not exposed to the internet. I will note that my node itself is on a 10.0.0.0/8 network so it is allowed to talk to itself via those allow rules, despite the fact that the normal k3s ports have not been exposed via UFW. What address space are you using for your node? install: echo "y" | ufw enable
ufw allow ssh
ufw allow http
ufw allow https
ufw default deny incoming
ufw default allow outgoing
ufw allow from any to 10.0.0.0/8
ufw allow from 10.0.0.0/8 to any
export INSTALL_K3S_VERSION=v1.19.2+k3s1; curl -sfL https://get.k3s.io | sh -
curl https://raw.githubusercontent.com/helm/helm/master/scripts/get-helm-3 | bash
helm repo add jetstack https://charts.jetstack.io
helm repo add rancher-latest https://releases.rancher.com/server-charts/latest
helm repo update
kubectl create namespace cert-manager
KUBECONFIG=/etc/rancher/k3s/k3s.yaml helm install cert-manager jetstack/cert-manager --wait --namespace cert-manager --version v1.0.2 --set installCRDs=true
kubectl create namespace cattle-system
KUBECONFIG=/etc/rancher/k3s/k3s.yaml helm install rancher rancher-latest/rancher --wait --namespace cattle-system --set replicas=1 --set hostname=ubuntu01.lan.khaus kubectl get nodes -o wide:
kubectl get pods -A:
top:
|
Beta Was this translation helpful? Give feedback.
-
Can you confirm the version you're running? The process name (k3s-server) in your top output is not what I would expect to see. What do you get from |
Beta Was this translation helpful? Give feedback.
-
Hi @brandond , Hi @philipp1992 . I have the same issue.
kubectl get nodes -o wide :
kubectl get pods -A
Very high load averages. And on the 4-node RKE cluster too, which is added to the rancher k3s 1 node cluster. |
Beta Was this translation helpful? Give feedback.
-
@risayew 25% of one core and ~1GB of RAM is on the high end of normal for k3s, but not out of line for what I'd expect while hosting Rancher, which is pretty hard on the apiserver due to all the custom resources. @philipp1992 reported k3s consistently using more than 100% and 3+ GB of memory which is unusual. |
Beta Was this translation helpful? Give feedback.
-
Thanks for the answer. And what do you think about load averages? They increase with the time. I have seen even >12. I have such values because the kubernetes was just upgraded several minutes ago to 1.18.9 |
Beta Was this translation helpful? Give feedback.
-
Load average doesn't really correspond to much in terms of resource utilization. There's a reason it's not used for scheduling. Just keep an eye on your CPU and memory. |
Beta Was this translation helpful? Give feedback.
-
Which version were you using before? Am I crazy if I say that I didn't have any high CPU and memory usage on k3s 1.17? |
Beta Was this translation helpful? Give feedback.
-
@unixfox Thats interesting. The previous version of the Kubernetes was 1.18.8. VS May be CPU utilization does not have anything to do with k3s or rancher, but with Kubernetes version. |
Beta Was this translation helpful? Give feedback.
-
I encountered this problem too, repeated created/deleted resources(deployment, job, etc), memory increased continuously. k3s version: container image rancher/k3s:v1.16.14-k3s1 |
Beta Was this translation helpful? Give feedback.
-
Additionally i observe too often ETCD downs and leader changes. Somebody has an idea, where should i start to look? |
Beta Was this translation helpful? Give feedback.
-
@risayewin my experience, excessive storage latency is the usual cause of etcd leader changes. |
Beta Was this translation helpful? Give feedback.
-
I have noticed this before on longer-running clusters, but for me was indicative of something else wrong going on in my cluster, most often this being a pod that was stuck in a perpetual CrashLoopBackoff state. I've tried over the last few days to reproduce this by running a continuous script that would add a few, deployments, pods, and services and then delete them every few seconds, but still did not notice any increase in CPU or memory. I was using 2cpus, 4gb ram, 10gb disk and saw my node consistently utilizing approx. 13% cpu and 41% mem with minor fluctuations. These numbers were grabbed from the scenario using v1.19.2+k3s1, with rancher deployed on the cluster. |
Beta Was this translation helpful? Give feedback.
-
Looks like this bug has been fixed with k3s-io/kine@3faf3a7 in kine 0.5.1, hasn't it? |
Beta Was this translation helpful? Give feedback.
-
While Kine does now compact more efficiently, that was not a significant contributor towards CPU or memory utilization. It was mostly just slow with external datastores due to the need to frequently call out to the SQL server during the compaction process. |
Beta Was this translation helpful? Give feedback.
-
We've added some resource profiling figures to our documentation feel free to take a look: |
Beta Was this translation helpful? Give feedback.
-
I am experiencing the same problem on a Raspberry Pi 3B+. k3s uses so much CPU that the device is barely functional. k3s is using about 30-50% of CPU on a relatively fresh install of Ubuntu Server 20.04.3 LTS 64-bit. I can't even run any k3s commands. It just hangs. |
Beta Was this translation helpful? Give feedback.
-
I have similar issue. To find root cause I tried deploy with disabled all components:
and still got CPU load about 10%. |
Beta Was this translation helpful? Give feedback.
-
Seeing this on a fresh 22.04 |
Beta Was this translation helpful? Give feedback.
-
can't find a similar issue on k0s though, wonder what they are doing differently 🤔 |
Beta Was this translation helpful? Give feedback.
-
That's what I noticed as well. Is it expected? |
Beta Was this translation helpful? Give feedback.
-
Chances are good this is caused by etcd - back in the days I have been observing and testing its performance and reliability - not only is it a resource hog, not designed to run on low power machines at all it has also some bugs deeply embeded in regards to ARM. Not sure what the state is in regards to that but I would highly recommending not using a multi-master node setup with it but rather something like this (e.g. can be combined with a bare metal Postgres cluster) https://github.com/alexellis/k3sup#create-a-multi-master-ha-setup-with-external-sql Bottomline of the story: A Multi-Master Setup will always have performance overhead even in idle since its required to be always in sync across the Masters in case one goes down. Also if you want maximal performance then dont expect it out of the box. You will need to know what you can and do deploy. You will need to understand the technology involved. Especially on a complex clustering environment. |
Beta Was this translation helpful? Give feedback.
-
I have the same issue, I reinstall k3s in OS more times, it always take high memory over 2g, I trust it works if I reinstall the OS, but I want to find a safe type to clear k3s resource instead of reinstall OS. the last I found it seems kubectl top nodes
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
master 73m 3% 2110Mi 57% free -m
total used free shared buff/cache available
Mem: 3677 973 513 2 2189 2411
Swap: 0 0 0 |
Beta Was this translation helpful? Give feedback.
-
Same with me Ubuntu 22.04
v1.25.8-rc1+k3s1
Was installing by running
|
Beta Was this translation helpful? Give feedback.
-
This is still an issue. I experience the same symptoms as described in this article on 2 out of 3 of my K3S / rancher installs : https://canthonyscott.com/k3s-cpu-usage-bug-remains/ . Indeed, the CPU slowly increases over time. All of them run:
Issue is present on my two archlinux installs over btrfs, and etcd is known for messing with btrfs. Anybody experiencing the issue running on this filesystem? |
Beta Was this translation helpful? Give feedback.
-
Hi, how to get this information?
|
Beta Was this translation helpful? Give feedback.
-
Potentially same issue: #7786 My issue got moved to discussion as there wasn't a direct idea that it was a k3s bug, however reading this discussion I get the feeling something about k3s must be broken. |
Beta Was this translation helpful? Give feedback.
-
I'm confused. On a single-node cluster, why is etcd even running? The default datastore is sqlite and yet I see both sqlite and etcd in my logs. |
Beta Was this translation helpful? Give feedback.
-
I'm experiencing the same issue with version 1.28.5 and 1.29.1. The node barely has any load, but k3s is eating ~25% of the CPU resources constantly. Any hint on how to debug this? |
Beta Was this translation helpful? Give feedback.
-
Update for anyone else, it might be related to garbage collection. Adding the following to your config.yaml may help. ## /etc/rancher/k3s/config.yaml
etcd-compaction-interval: "240m"
kube-controller-manager-arg:
- "--concurrent-gc-syncs=5"
etcd-cron-compaction-time: "0 2 * * *"
|
Beta Was this translation helpful? Give feedback.
-
HI,
i have an ubuntu 20.02 machine, which has k3s installe. After installing rancher, the memory and cpu consumption of the k3s service is skyrocketing.
versions:
"Ubuntu 20.04.1 LTS"
- export INSTALL_K3S_VERSION=v1.19.2+k3s1; curl -sfL https://get.k3s.io | sh -
the server running the k3s is a normal x86 vserver with 4 cores and 8 gb ram.
any idea what could cause the high load?
kind regards
Philipp
Beta Was this translation helpful? Give feedback.
All reactions