Skip to content
This repository has been archived by the owner on Mar 28, 2020. It is now read-only.

ETCDCluster stuck in failing state when secret missing upon creation #2129

Open
rafaltrojniak opened this issue Oct 18, 2019 · 0 comments
Open

Comments

@rafaltrojniak
Copy link

The problem description

I have working integration between cert-manger and etcd-operator to manage their TLS certificates.
Unfortunately because all manifests are deployed at the same time to the cluster, and ECDCluster manifest appears before certificates appear, the resulting ETCDcluster is stuck in failing state (see debug info below).

Deleting and re-creating the same ETCDCluster resolves the situation, but this is not an automatic process anymore.

the expected behavior

I would expect the operator to re-validate dependencies of failed stacks periodically (like every 10s) and when the dependencies (secret in that case) appears, the operator should resume creation process.

debug information

The resulting ETCDCluster object looks like that :

apiVersion: etcd.database.coreos.com/v1beta2
kind: EtcdCluster
metadata:
  annotations:
    etcd.database.coreos.com/scope: clusterwide
    kubectl.kubernetes.io/last-applied-configuration: [...]
  creationTimestamp: "2019-10-18T13:14:28Z"
  generation: 1
  labels:
    app: catalog-apiserver-etcd
    appId: servicecatalog
  name: catalog-apiserver-etcd
  namespace: etcdtest
  resourceVersion: "61728169"
  selfLink: /apis/etcd.database.coreos.com/v1beta2/namespaces/etcdtest/etcdclusters/catalog-apiserver-etcd
  uid: 36dd75a3-f1a9-11e9-8639-0ab63d02cdd0
spec:
  TLS:
    static:
      member:
        peerSecret: catalog-apiserver-etcd-peer-renamed
        serverSecret: catalog-apiserver-etcd-server-renamed
      operatorSecret: catalog-apiserver-etcd-operator-renamed
  pod:
    affinity:
      nodeAffinity:
        preferredDuringSchedulingIgnoredDuringExecution:
        - preference: {}
          weight: 1
      podAntiAffinity:
        preferredDuringSchedulingIgnoredDuringExecution:
        - podAffinityTerm:
            labelSelector:
              matchExpressions:
              - key: etcd_cluster
                operator: In
                values:
                - catalog-apiserver-etcd
            topologyKey: kubernetes.io/hostname
          weight: 100
    persistentVolumeClaimSpec:
      accessModes:
      - ReadWriteOnce
      dataSource: null
      resources:
        requests:
          storage: 1Gi
    resources: {}
  repository: quay.io/coreos/etcd
  size: 3
  version: 3.2.25
status:
  currentVersion: ""
  members: {}
  phase: Failed
  reason: secrets "catalog-apiserver-etcd-operator-renamed" not found
  size: 0
  targetVersion: ""

Even though the secret is already there :

$ kubectl get secret catalog-apiserver-etcd-operator-renamed
NAME                                      TYPE     DATA   AGE
catalog-apiserver-etcd-operator-renamed   Opaque   3      20m

Operator logs contain:

time="2019-10-18T13:14:28Z" level=error msg="cluster failed to setup: secrets \"catalog-apiserver-etcd-operator-renamed\" not found" cluster-name=catalog-apiserver-etcd cluster-namespace=etcdtest pkg=cluster
time="2019-10-18T13:14:28Z" level=warning msg="fail to handle event: ignore failed cluster (catalog-apiserver-etcd). Please delete its CR" pkg=controller
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant