failed to publish local member to cluster through raft #15996
Replies: 11 comments
-
You probably want to check the DNS set up because dns lookup failed ==
|
Beta Was this translation helpful? Give feedback.
-
Hi @chaochn47 , Thanks in advance |
Beta Was this translation helpful? Give feedback.
-
Hi Team, |
Beta Was this translation helpful? Give feedback.
-
Hi Team, |
Beta Was this translation helpful? Give feedback.
-
cc @jmhbnz |
Beta Was this translation helpful? Give feedback.
-
Could you please provide evidence that network issue was resolved? What was broken in the network set up and what is the fix?
Yes, it's the expected behavior of etcd. For each pair of
https://github.com/xiang90/probing/blob/master/prober.go#L51-L84 If the network issue is resolved, then monitorProbingStatus should not continuously report DNS look up failures every 5s. However, from the log you provided, the statement of "the network issue is resolved" is not true. Not to mention local member |
Beta Was this translation helpful? Give feedback.
-
Hi @chaochn47 , |
Beta Was this translation helpful? Give feedback.
-
If you delete a pod and a new pod is scheduled to a new node with no network issues, the issue is likely to be resolved. |
Beta Was this translation helpful? Give feedback.
-
Thank you for your question, this support issue will be moved to our Discussion Forums.
|
Beta Was this translation helpful? Give feedback.
-
What happened?
We are getting following error in etcd deployment continuously and etcd is not coming up, etcd deployed as 3 replicas.
{"caller":"etcdserver/server.go:2075","error":"etcdserver: request timed out","local-member-attributes":"{Name:etcd-0 ClientURLs:[https://etcd-0.etcd.footprint:2379]}","local-member-id":"a691154c733f3752","message":"failed to publish local member to cluster through raft","metadata":{"container_name":"dced","namespace":"footprint","pod_name":"etcd-0"},"publish-timeout":"7s","request-path":"/0/members/a691154c733f3752/attributes","service_id":"etcd","severity":"warning","timestamp":"2023-05-17T14:24:56.527+00:00","version":"1.2.0"}
What did you expect to happen?
etcd should come up and serve the request
How can we reproduce it (as minimally and precisely as possible)?
we are trying to deleted 3 etcd pods multiple times(50 or more ) to reproduce it.
Anything else we need to know?
No response
Etcd version (please run commands below)
bash-4.4$ etcd --version
etcd Version: 3.5.7
Git SHA: 215b53c
Go Version: go1.17.13
Go OS/Arch: linux/amd64
bash-4.4$ etcdctl version
etcdctl version: 3.5.7
API version: 3.5
bash-4.4$
Etcd configuration (command line flags or environment variables)
bash-4.4$ env | grep ETCD
ETCD_INITIAL_CLUSTER_TOKEN=erc
ETCD_MAX_SNAPSHOTS=3
ETCD_LISTEN_CLIENT_URLS=https://0.0.0.0:2379/
ETCD_HEARTBEAT_INTERVAL=100
ETCD_AUTO_COMPACTION_RETENTION=100
ETCD_TRUSTED_CA_FILE=/data/combinedca/cacertbundle.pem
ENTRYPOINT_RESTART_ETCD=true
ETCDCTL_CERT=/run/sec/certs/client/clicert.pem
ETCD_LOG_LEVEL=info
ETCD_LISTEN_PEER_URLS=https://0.0.0.0:2380/
ETCD_AUTO_COMPACTION_MODE=revision
ETCD_LOGGER=zap
ETCD_CERT_FILE=/run/sec/certs/server/srvcert.pem
ETCD_PEER_AUTO_TLS=true
ETCD_DATA_DIR=/data
ETCD_CLIENT_CERT_AUTH=true
ETCDCTL_ENDPOINTS=erc.zmorrah:2379
ETCD_METRICS=basic
ETCDCTL_API=3
ETCD_SNAPSHOT_COUNT=5000
ETCD_MAX_WALS=3
ETCD_INITIAL_ADVERTISE_PEER_URLS=https://erc-0.erc-peer.zmorrah.svc.cluster.local:2380/
ETCD_KEY_FILE=/run/sec/certs/server/srvprivkey.pem
ETCD_ELECTION_TIMEOUT=1000
ETCDCTL_CACERT=/data/combinedca/cacertbundle.pem
ETCD_NAME=erc-0
ETCD_QUOTA_BACKEND_BYTES=268435456
ETCD_ADVERTISE_CLIENT_URLS=https://erc-0.erc.zmorrah:2379/
ETCDCTL_KEY=/run/sec/certs/client/cliprivkey.pem
Etcd debug information (please run commands below, feel free to obfuscate the IP address or FQDN in the output)
etcd pod 0 itself is not starting,
so could not capture
Relevant log output
Beta Was this translation helpful? Give feedback.
All reactions