-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extreme CPU Usage on small cluster #10396
Comments
Okay, this is certainly SQLLite related. Migrating to etcd the cluster has returned to normal levels by adding The server is now at 0.4 1m load average and responding to requests quickly. The SQLLite vacuum did not do anything, but migrating to etcd did. Trying to set Happy to close as the root issue seems clear. |
@Bonn93 we are running into a similar issue - I found that reinstalling the cluster immediately improved the situation (doesn't matter if I reinstall with or without the |
Yeah, I've bootstrapped a few smaller clusters recently and the smaller ones with the non ha control plane all get like this in the long run. Doing the init flag and converting to etcd fixes and the other clusters long run are all happy. I just bootstrap with etcd single node now. I don't think sqllite is the best choice. |
Environmental Info:
K3s Version:
Node(s) CPU architecture, OS, and Version:
Cluster Configuration:
1 server, 2 agents
Describe the bug:
k3s server has extremely high CPU usage, even scaling pods back to 0 there's no change. A 4 core/16GB RAM machine has a load average of and often spikes to 50/60!
Steps To Reproduce:
k3s has been running for a while with the upgrade controller/operator.
Expected behavior:
CPU is within sane numbers
Actual behavior:
Extremely high CPU usage, slow API response times and often timeouts. Appears to spam /var/log/messages with
trace
logging.Additional context / logs:
Nodes have NVMe local drives. The
k3s-server
process is using 400%, there's no other large processes on the system. It has high %USR with about 20% SYS/Kernel time. The state.db is also 12GB~ and SQLLite fails to vacuum. Cluster age is 270d.The text was updated successfully, but these errors were encountered: