Change etcd liveness probes to the new livez and readyz endpoints #3039

siyuanfoundation · 2024-04-01T23:41:27Z

Since v3.5.11, etcd has added new livez/readyz HTTP endpoints
The design for the new probes is documented in etcd livez and readyz probes

The new probes are Kubernetes API compliant, have more fine-grained metrics, and are better suited for liveness and readiness checks than /health?exclude=NOSPACE&serializable=true, and /health?serializable=false

Probe	endpoint used by kubeadm	endpoint proposed to use	difference
LivenessProbe	`/health?exclude=NOSPACE&serializable=true`	`/livez`	`/livez` does not check for leaders because restarting local server does not mitigate the problem of cluster missing leader, while a crashing loop would worsen the problem.
StartupProbe	`/health?serializable=false`	`/readyz`	`/readyz` ignores the `NOSPACE` alarm, because if the local server is out of quota, the server is still able to take read/delete requests, and still able forward write requests to the leader to write successfully in the cluster.
ReadinessProbe	None	`/readyz`

We should switch the current etcd LivenessProbe and StartupProbe to the new endpoints which are Kubernetes API compliant.

Is this a BUG REPORT or FEATURE REQUEST?

FEATURE REQUEST

The text was updated successfully, but these errors were encountered:

siyuanfoundation · 2024-04-01T23:42:17Z

cc @serathius @ahrtr

neolit123 · 2024-04-02T07:24:48Z

Since v3.5.11, etcd has added new livez/readyz HTTP endpoints

Q: is there are reason etcd released this feature under a patch release? perhaps it was under a bugfix.

kubeadm 1.30 (to be released) is using 3.12
https://github.com/kubernetes/kubernetes/blob/master/cmd/kubeadm/app/constants/constants.go#L481-L491

so for kubeadm 1.31 we can switch to the new probe method.
it might make more sense to switch in the kubeadm MINOR version that starts using 3.6 so that our users will know a bigger change happened, but perhaps this is not needed.

generally more opinions on this topic are appreciated.
cc @SataQiu @pacoxu

neolit123 · 2024-04-02T07:28:13Z

/readyz

do we expect any behavior changes if we add a readiness probe for etcd in kubeadm?
currently it indeed don't have a readiness probe.

pacoxu · 2024-04-02T07:28:53Z

Since we backport etcd minor version to previous releases as well, it's quite safe to change the probe in v1.31.

https://github.com/kubernetes/kubernetes/blob/release-1.27/cmd/kubeadm/app/constants/constants.go#L481-L486

pacoxu · 2024-04-02T07:30:42Z

BTW, do we plan to add the ReadinessProbe?

cluster/gce: add startup probe for etcd (to be consistent with kubeadm install) kubernetes#110767 & https://github.com/kubernetes/kubernetes/blob/master/cluster/gce/manifests/etcd.manifest GCE script even not add the startupProbe yet.

SataQiu · 2024-04-02T08:03:59Z

+1 for applying the new livez/readyz probe in v1.31

do we expect any behavior changes if we add a readiness probe for etcd in kubeadm?
currently it indeed don't have a readiness probe.

That seems safe because we have already checked the etcd cluster health by client and have adopted a retry mechanism.

ahrtr · 2024-04-02T13:17:23Z

Q: is there are reason etcd released this feature under a patch release? perhaps it was under a bugfix.

I raised the same question/comment when I was reviewing the lives/readz design doc. Actually I wasn't a big fan of backporting it to stable releases, but I did not strongly against it, reasons,

It's backward compatible; It doesn't change the existing /health endpoint.
It isn't the core part of etcdserver;
It's strongly desired by some people in the community, although no strong justification provided.

so for kubeadm 1.31 we can switch to the new probe method.
it might make more sense to switch in the kubeadm MINOR version that starts using 3.6 so that our users will know a bigger change happened, but perhaps this is not needed.

generally more opinions on this topic are appreciated.

I believe there isn't any behaviour change in etcd's 3.5 and 3.4 patch releases in term of the new /livez and /readz probes. Specifically, /livez is equivalent to /health?exclude=NOSPACE&serializable=true, and /readz is equivalent to /health?serializable=false.

We may extend both /livez and /readyz in main (3.6.0), but as mentioned above, there shouldn't be any behaviour or breaking change in 3.5 and. 3.4.

It's totally up to kubeadm / cluster-life-cycle's maintainers / leads to decide when to use the new /livez and /readyz endpoints. But please do not expect we can get etcd 3.6.0 released soon.

Since we backport etcd minor version to previous releases as well, it's quite safe to change the probe in v1.31.

I think they are orthogonal changes.

BTW, do we plan to add the ReadinessProbe?

cluster/gce: add startup probe for etcd (to be consistent with kubeadm install) kubernetes#110767 & https://github.com/kubernetes/kubernetes/blob/master/cluster/gce/manifests/etcd.manifest GCE script even not add the startupProbe yet.

We can reuse the /readyz for the startupProbe. If you or anyone insist on adding a dedicated endpoint for it, please Let's discuss it under etcd-io/etcd#16007.

neolit123 · 2024-04-02T13:31:17Z

It's totally up to kubeadm / cluster-life-cycle's maintainers / leads to decide when to use the new /livez and /readyz endpoints. But please do not expect we can get etcd 3.6.0 released soon.

ok, seems like this can be added in kubeadm 1.31 once code freeze for 1.30 is over.
thanks for the info

neolit123 · 2024-04-23T08:00:25Z

@siyuanfoundation @ahrtr any notable regressions in >= 3.5.11 that may keep users on < 3.5.11?
also i assume we really do recommend users to move to the latest PATCH of 3.5 due to CVE fixes, correct?

i have a WIP PR for this here, but it does not do versions checking and it will cause an error for kubeadm users that are deploying a custom etcd version < 3.5.11 on kubeadm 1.31. i guess etcd will just fail to start.

to be on the safe side the PR can also include some version checking, but it will not work for custom image strings as some users pass SHA for the etcd image. it would only work for users that are using a semver in the etcd image.

i think my preference would be to just have this mentioned in the release note, and also make kubeadm users stuck on older custom etcd version to finally upgrade!

kubernetes/kubernetes#124465

siyuanfoundation · 2024-04-23T16:19:05Z

@neolit123 Thanks for working on this!
I don't think there is any notable regressions in >= 3.5.11.
I like the idea of have this mentioned in the release note, and also make kubeadm users stuck on older custom etcd version to finally upgrade!

ahrtr · 2024-04-23T17:26:26Z

@siyuanfoundation @ahrtr any notable regressions in >= 3.5.11 that may keep users on < 3.5.11?
also i assume we really do recommend users to move to the latest PATCH of 3.5 due to CVE fixes, correct?

No regression in >= 3.5.11.

Yes, it's recommended to bump etcd to the latest patch.

neolit123 added priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. kind/feature Categorizes issue or PR as related to a new feature. area/etcd labels Apr 2, 2024

neolit123 added this to the v1.31 milestone Apr 2, 2024

neolit123 mentioned this issue Apr 23, 2024

kubeadm: use etcd 3.5.11's /livez and /readyz endpoints for probes kubernetes/kubernetes#124465

Merged

neolit123 self-assigned this Apr 23, 2024

k8s-ci-robot closed this as completed in kubernetes/kubernetes#124465 May 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change etcd liveness probes to the new livez and readyz endpoints #3039

Change etcd liveness probes to the new livez and readyz endpoints #3039

siyuanfoundation commented Apr 1, 2024

siyuanfoundation commented Apr 1, 2024 •

edited

neolit123 commented Apr 2, 2024

neolit123 commented Apr 2, 2024 •

edited

pacoxu commented Apr 2, 2024

pacoxu commented Apr 2, 2024 •

edited

SataQiu commented Apr 2, 2024

ahrtr commented Apr 2, 2024

neolit123 commented Apr 2, 2024

neolit123 commented Apr 23, 2024 •

edited

siyuanfoundation commented Apr 23, 2024

ahrtr commented Apr 23, 2024

Change etcd liveness probes to the new livez and readyz endpoints #3039

Change etcd liveness probes to the new livez and readyz endpoints #3039

Comments

siyuanfoundation commented Apr 1, 2024

Is this a BUG REPORT or FEATURE REQUEST?

siyuanfoundation commented Apr 1, 2024 • edited

neolit123 commented Apr 2, 2024

neolit123 commented Apr 2, 2024 • edited

pacoxu commented Apr 2, 2024

pacoxu commented Apr 2, 2024 • edited

SataQiu commented Apr 2, 2024

ahrtr commented Apr 2, 2024

neolit123 commented Apr 2, 2024

neolit123 commented Apr 23, 2024 • edited

siyuanfoundation commented Apr 23, 2024

ahrtr commented Apr 23, 2024

siyuanfoundation commented Apr 1, 2024 •

edited

neolit123 commented Apr 2, 2024 •

edited

pacoxu commented Apr 2, 2024 •

edited

neolit123 commented Apr 23, 2024 •

edited