ETCD 3.5.5 showing ReadIndex response took too long #17289

hrid21 · 2024-01-22T12:41:45Z

hrid21
Jan 22, 2024

A lot of readIndex response took too long messages are visible and metrics do not show any issues with disk or network

`
2024-01-09T17:11:56.231+00:00;warning;dced;;;waiting for ReadIndex response took too long, retrying
2024-01-09T17:11:56.732+00:00;warning;dced;;;waiting for ReadIndex response took too long, retrying
2024-01-09T17:11:57.232+00:00;warning;dced;;;waiting for ReadIndex response took too long, retrying
2024-01-09T17:11:57.698+00:00;info;dced;;;ca38df874f7fa0fe received MsgPreVoteResp from ca38df874f7fa0fe at term 650
2024-01-09T17:11:57.698+00:00;info;dced;;;ca38df874f7fa0fe is starting a new election at term 650
2024-01-09T17:11:57.698+00:00;info;dced;;;ca38df874f7fa0fe [logterm: 650, index: 641096] sent MsgPreVote request to 9bef648454c1c4b4 at term 650
2024-01-09T17:11:57.698+00:00;info;dced;;;ca38df874f7fa0fe [logterm: 650, index: 641096] sent MsgPreVote request to b347aa03fa74d545 at term 650
2024-01-09T17:11:57.698+00:00;info;dced;;;ca38df874f7fa0fe became pre-candidate at term 650
2024-01-09T17:11:57.733+00:00;warning;dced;;;waiting for ReadIndex response took too long, retrying
2024-01-09T17:11:58.234+00:00;warning;dced;;;waiting for ReadIndex response took too long, retrying
2024-01-09T17:11:58.561+00:00;info;dced;;;trace[1449326621] range
2024-01-09T17:11:58.561+00:00;warning;dced;;;apply request took too long
2024-01-09T17:11:58.735+00:00;warning;dced;;;waiting for ReadIndex response took too long, retrying
2024-01-09T17:11:58.762+00:00;info;dced;;;trace[1758792213] range

2024-01-09T17:11:58.762+00:00;warning;dced;;;[core] grpc: Server.processUnaryRPC failed to write status: connection error: desc = "transport is closing"
2024-01-09T17:11:58.762+00:00;warning;dced;;;apply request took too long
2024-01-09T17:11:59.198+00:00;info;dced;;;ca38df874f7fa0fe received MsgPreVoteResp from ca38df874f7fa0fe at term 650
2024-01-09T17:11:59.198+00:00;info;dced;;;ca38df874f7fa0fe [logterm: 650, index: 641096] sent MsgPreVote request to 9bef648454c1c4b4 at term 650
2024-01-09T17:11:59.198+00:00;info;dced;;;ca38df874f7fa0fe [logterm: 650, index: 641096] sent MsgPreVote request to b347aa03fa74d545 at term 650

2024-01-09T17:11:59.198+00:00;info;dced;;;ca38df874f7fa0fe is starting a new election at term 650

2024-01-09T17:11:59.198+00:00;info;dced;;;ca38df874f7fa0fe became pre-candidate at term 650
2024-01-09T17:11:59.235+00:00;warning;dced;;;waiting for ReadIndex response took too long, retrying
2024-01-09T17:11:59.523+00:00;warning;dced;;;apply request took too long
2024-01-09T17:11:59.524+00:00;info;dced;;;trace[1298228346] range
2024-01-09T17:11:59.524+00:00;warning;dced;;;[core] grpc: Server.processUnaryRPC failed to write status: connection error: desc = "transport is closing"
2024-01-09T17:11:59.736+00:00;warning;dced;;;waiting for ReadIndex response took too long, retrying
2024-01-09T17:11:59.980+00:00;info;dced;;;trace[1167299795] range
2024-01-09T17:11:59.980+00:00;warning;dced;;;apply request took too long
2024-01-09T17:11:59.980+00:00;warning;dced;;;[core] grpc: Server.processUnaryRPC failed to write status: connection error: desc = "transport is closing"
2024-01-09T17:12:00.208+00:00;warning;dced;;;apply request took too long
2024-01-09T17:12:00.208+00:00;warning;dced;;;[core] grpc: Server.processUnaryRPC failed to write status: connection error: desc = "transport is closing"
2024-01-09T17:12:00.208+00:00;info;dced;;;trace[191081109] range
2024-01-09T17:12:00.237+00:00;warning;dced;;;waiting for ReadIndex response took too long, retrying
2024-01-09T17:12:00.375+00:00;warning;dced;;;apply request took too long`

Etcd documentaion mentions backend commit, roundtrip time and walfync should be less than 25 ms, 50 ms and 10 ms, respectively

From the metrics p99 of backend commit is 1 ms,
p99 of Round trip time is 12 ms
p99 of walfsync duration is 8 ms

As these metrics are fine still why are ReadIndex messages observed in logs?

jmhbnz · 2024-01-24T03:48:18Z

jmhbnz
Jan 24, 2024
Maintainer

Hey @hrid21 - Thanks for your question, can you please confirm the cpu load metrics for your etcd cluster machines?

Another reason for performance issues with etcd can be cpu starvation. If monitoring of the machine’s CPU usage shows heavy utilization, there may not be enough compute capacity for etcd.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ETCD 3.5.5 showing ReadIndex response took too long #17289

{{title}}

Replies: 1 comment

{{title}}

Select a reply

ETCD 3.5.5 showing ReadIndex response took too long #17289

hrid21 Jan 22, 2024

Replies: 1 comment

jmhbnz Jan 24, 2024 Maintainer

hrid21
Jan 22, 2024

jmhbnz
Jan 24, 2024
Maintainer