More volumes are causing more IO on RKE2 server nodes #8543
Replies: 4 comments 12 replies
-
Can anyone confirm that it's normal behaviour or I have to dig deeper to see what is causing higher load. Thanks |
Beta Was this translation helpful? Give feedback.
-
Hello @serhiynovos, Do you mean the number of write operations has increased after installing MinIO? If that's the case, it's likely due to MinIO, and you can investigate why MinIO is writing data to the volumes more frequently. This increase is independent of the type of storage used. |
Beta Was this translation helpful? Give feedback.
-
We have https://longhorn.io/kb/kubernetes-resource-revision-frequency-expectations/ describing the expectations for API server PUT operations in a stable cluster. In that investigation, we focused on update frequency and not on I/O throughput, but it may help explain the behavior your are observing. There is an improvement planned in #8076 that should help to lower the update frequency for engine and volume objects. If you have Prometheus active in the cluster, it may help to run some queries like we did in #8114 (comment) to see if they are in line with the "worst case" scenario for a stable cluster in the knowledgebase (i.e. 12 PUTs per minute per engine resource and 12 PUTs per minute per volume resource). |
Beta Was this translation helpful? Give feedback.
-
@ejweber BTW just noticed that on worker nodes where longhorn has replica I see constant write rate about 600kb/s. iostat and iotop show that writing is make mostly by longhorn processes to to the disk and mount point which is scheduled for volumes and write is about 20 - 30 kb for each pvc. As I have now about 30 replicas on this pod I suppose it's expected behavior and write rate. |
Beta Was this translation helpful? Give feedback.
-
I'm managing an RKE2 cluster with Longhorn 1.6.1 in an on-prem setup. Yesterday, I installed MinIO using 9 volumes, each configured without replication. I've noticed an increase in the write rate. Although the increase is modest compared to the previous version 1.5.4, our future plans involve deploying approximately 300 additional volumes, most with a replication factor of 3. I'm concerned about the potential I/O load on my server/etcd nodes. Could Longhorn handle this increased load without prematurely wearing out the SSDs on server nodes? Is this behavior normal, or should I be looking into potential issues?
This is my current state with replicas
And how IO rate changing after installing MINIO with total 9 replicas
Beta Was this translation helpful? Give feedback.
All reactions