etcd - All grpc_code for grpc_method "Watch" is "Unavailable" #20311

Reamer · 2018-07-13T13:53:02Z

Hi,
I noticed, that every grpc_code for grpc_method "Watch" is "Unavailable" in my okd cluster. My plan is to monitor etcd-instances with default prometheus alerts from the etcd-project.
Maybe the watch-connection is not closed correctly and goes into an timeout.

Version

Client Version: 4.7.18
Server Version: 4.7.0-0.okd-2021-08-22-163618
Kubernetes Version: v1.20.0-1093+4593a24e8fd58d-dirty

Steps To Reproduce

install okd 4.7
Switch to etcd project oc project openshift-etcd
Log in to the first etcd member oc rsh etcd-master1.mycompany.com
curl -s --cacert "/etc/kubernetes/static-pod-certs/configmaps/etcd-serving-ca/ca-bundle.crt" --cert "/etc/kubernetes/static-pod-certs/secrets/etcd-all-peer/etcd-peer-master1.mycompany.com.crt" --key "/etc/kubernetes/static-pod-certs/secrets/etcd-all-peer/etcd-peer-master1.mycompany.com.key" https://localhost:2379/metrics

Current Result

grpc_server_handled_total{grpc_code="Unavailable",grpc_method="Watch",grpc_service="etcdserverpb.Watch",grpc_type="bidi_stream"} 1434

Expected Result

grpc_server_handled_total{grpc_code="OK",grpc_method="Watch",grpc_service="etcdserverpb.Watch",grpc_type="bidi_stream"} 1434

Additional Information

If that behavior is already fixed or it's a false positive, let me know.

The text was updated successfully, but these errors were encountered:

jwforres · 2018-07-25T19:55:27Z

@openshift/sig-master

Reamer · 2018-08-09T09:27:44Z

Still present with 3.10

oc v3.10.0+0c4577e-1
kubernetes v1.10.0+b81c8f8
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://s-cp-lb-01.cloud.example.de:443
openshift v3.10.0+7eee6f8-2
kubernetes v1.10.0+b81c8f8

openshift-bot · 2018-11-07T10:51:23Z

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

vsliouniaev · 2018-11-07T11:30:08Z

+1 on this. We've disabled this alert on our setup because it's just flapping and not indicating any failures.

Reamer · 2018-11-07T13:16:02Z

/remove-lifecycle stale

gaopeiliang · 2018-11-13T03:49:00Z

+1 on this , I also found it on etcd cluster master node , when add etcd3_alert.rules ..

it will cycle five mintue ... but we can't find something wrong with k8s ....

gaopeiliang · 2018-11-13T03:49:36Z

/remove-lifecycle stale

arslanbekov · 2018-11-29T12:26:23Z

+1.
I run etcd with debug log lever, and find this error:

etcdserver/api/v3rpc: failed to receive watch request from gRPC stream ("rpc error: code = Unavailable desc = stream error: stream ID 71; CANCEL")

errors about 1 time ~ in 5 minutes, stream ID - unique

etcd 3.2.24 / 3.2.25 / 3.3.10
Monitoring with prometheus (i getting this allert).

Any updates?

judexzhu · 2018-12-19T00:41:24Z

+1, ectd 3.3.10 with Prometheus Operator on Kubernetes 1.11.5

I have 5 nodes, but only one node having the alert, Others seem fine.

the etcd cluster runs well without issue.

zqyangchn · 2019-02-16T12:03:06Z

zqyangchn · 2019-02-16T12:03:19Z

/remove-lifecycle stale

openshift-bot · 2019-05-17T12:35:58Z

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

Reamer · 2019-05-20T07:20:42Z

Still reproducible on Origin 3.11

Reamer · 2019-05-20T07:21:12Z

/remove-lifecycle stale

Reamer · 2019-06-21T12:04:58Z

Relates to openshift/cluster-monitoring-operator#340
and etcd-io/etcd#10289

openshift-bot · 2019-09-19T13:23:18Z

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

Reamer · 2019-09-19T14:21:49Z

/remove-lifecycle stale
Still present

openshift-bot · 2019-12-18T16:00:16Z

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

Reamer · 2019-12-19T08:02:05Z

/lifecycle frozen
/remove-lifecycle stale

hexfusion · 2020-01-02T17:50:50Z

/assign

Joseph94m · 2021-10-06T15:42:24Z

Any news about this ?

Reamer · 2021-10-07T07:56:37Z

At the moment I am using okd 4.7 and this bug is still present.
Prometheus-Query:

grpc_server_handled_total{grpc_code="Unavailable",grpc_service="etcdserverpb.Watch"}

jwforres assigned mfojtik Jul 25, 2018

openshift-ci-robot added the sig/master label Jul 25, 2018

openshift-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 7, 2018

openshift-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 7, 2018

arslanbekov mentioned this issue Nov 29, 2018

gRPC code Unavailable instead Canceled etcd-io/etcd#10289

Closed

openshift-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 17, 2019

openshift-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 20, 2019

openshift-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 19, 2019

openshift-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 19, 2019

openshift-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 18, 2019

openshift-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 19, 2019

openshift-ci-robot added the lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. label Dec 19, 2019

openshift-ci-robot assigned hexfusion Jan 2, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

etcd - All grpc_code for grpc_method "Watch" is "Unavailable" #20311

etcd - All grpc_code for grpc_method "Watch" is "Unavailable" #20311

Reamer commented Jul 13, 2018 •

edited

jwforres commented Jul 25, 2018

Reamer commented Aug 9, 2018

openshift-bot commented Nov 7, 2018

vsliouniaev commented Nov 7, 2018

Reamer commented Nov 7, 2018

gaopeiliang commented Nov 13, 2018

gaopeiliang commented Nov 13, 2018

arslanbekov commented Nov 29, 2018 •

edited

judexzhu commented Dec 19, 2018

zqyangchn commented Feb 16, 2019

zqyangchn commented Feb 16, 2019

openshift-bot commented May 17, 2019

Reamer commented May 20, 2019

Reamer commented May 20, 2019

Reamer commented Jun 21, 2019

openshift-bot commented Sep 19, 2019

Reamer commented Sep 19, 2019

openshift-bot commented Dec 18, 2019

Reamer commented Dec 19, 2019

hexfusion commented Jan 2, 2020

Joseph94m commented Oct 6, 2021

Reamer commented Oct 7, 2021

etcd - All grpc_code for grpc_method "Watch" is "Unavailable" #20311

etcd - All grpc_code for grpc_method "Watch" is "Unavailable" #20311

Comments

Reamer commented Jul 13, 2018 • edited

Version

Steps To Reproduce

Current Result

Expected Result

Additional Information

jwforres commented Jul 25, 2018

Reamer commented Aug 9, 2018

openshift-bot commented Nov 7, 2018

vsliouniaev commented Nov 7, 2018

Reamer commented Nov 7, 2018

gaopeiliang commented Nov 13, 2018

gaopeiliang commented Nov 13, 2018

arslanbekov commented Nov 29, 2018 • edited

judexzhu commented Dec 19, 2018

zqyangchn commented Feb 16, 2019

zqyangchn commented Feb 16, 2019

openshift-bot commented May 17, 2019

Reamer commented May 20, 2019

Reamer commented May 20, 2019

Reamer commented Jun 21, 2019

openshift-bot commented Sep 19, 2019

Reamer commented Sep 19, 2019

openshift-bot commented Dec 18, 2019

Reamer commented Dec 19, 2019

hexfusion commented Jan 2, 2020

Joseph94m commented Oct 6, 2021

Reamer commented Oct 7, 2021

Reamer commented Jul 13, 2018 •

edited

arslanbekov commented Nov 29, 2018 •

edited