Extreme CPU Usage on small cluster #10396

Bonn93 · 2024-06-24T14:46:43Z

Environmental Info:
K3s Version:

k3s version v1.30.1+k3s1 (80978b5b)
go version go1.22.2

Node(s) CPU architecture, OS, and Version:

Linux k3s-server.internal.self-hosted.io 4.18.0-513.24.1.el8_9.x86_64 #1 SMP Thu Apr 4 18:13:02 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

Cluster Configuration:
1 server, 2 agents

Describe the bug:
k3s server has extremely high CPU usage, even scaling pods back to 0 there's no change. A 4 core/16GB RAM machine has a load average of and often spikes to 50/60!

load average: 13.42, 20.74, 22.08

Steps To Reproduce:
k3s has been running for a while with the upgrade controller/operator.

Expected behavior:
CPU is within sane numbers

Actual behavior:
Extremely high CPU usage, slow API response times and often timeouts. Appears to spam /var/log/messages with trace logging.

Additional context / logs:
Nodes have NVMe local drives. The k3s-server process is using 400%, there's no other large processes on the system. It has high %USR with about 20% SYS/Kernel time. The state.db is also 12GB~ and SQLLite fails to vacuum. Cluster age is 270d.

The text was updated successfully, but these errors were encountered:

Bonn93 · 2024-06-24T22:42:23Z

Okay, this is certainly SQLLite related. Migrating to etcd the cluster has returned to normal levels by adding --cluster-init to the systemd unit, however the Trace: logging is still present.

The server is now at 0.4 1m load average and responding to requests quickly. The SQLLite vacuum did not do anything, but migrating to etcd did. Trying to set -v=0 in the systemd unit doesn't seem to have an effect on the logs either.

Happy to close as the root issue seems clear.

paketb0te · 2025-02-12T16:12:56Z

@Bonn93 we are running into a similar issue - I found that reinstalling the cluster immediately improved the situation (doesn't matter if I reinstall with or without the --cluster-init flag to enabl etcd).
Did using etcd as the datastore fix the issue for you in the long run?

Bonn93 · 2025-02-12T19:12:32Z

Yeah, I've bootstrapped a few smaller clusters recently and the smaller ones with the non ha control plane all get like this in the long run. Doing the init flag and converting to etcd fixes and the other clusters long run are all happy.

I just bootstrap with etcd single node now. I don't think sqllite is the best choice.

github-project-automation bot added this to K3s Development Jun 24, 2024

github-project-automation bot moved this to New in K3s Development Jun 24, 2024

Bonn93 closed this as completed Jun 24, 2024

github-project-automation bot moved this from New to Done Issue in K3s Development Jun 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extreme CPU Usage on small cluster #10396

Extreme CPU Usage on small cluster #10396

Bonn93 commented Jun 24, 2024

Bonn93 commented Jun 24, 2024

paketb0te commented Feb 12, 2025

Bonn93 commented Feb 12, 2025

Extreme CPU Usage on small cluster #10396

Extreme CPU Usage on small cluster #10396

Comments

Bonn93 commented Jun 24, 2024

Bonn93 commented Jun 24, 2024

paketb0te commented Feb 12, 2025

Bonn93 commented Feb 12, 2025