Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zarf-Seed-Registry Installation Fails on Init with Deployment is not ready: zarf/zarf-docker-registry error #592

Closed
erikschlegel opened this issue Jul 7, 2022 · 33 comments · Fixed by #2457

Comments

@erikschlegel
Copy link

erikschlegel commented Jul 7, 2022

Environment

Device and OS: Azure AKS Linux Ubuntu 20.04
App version: 0.19.6
Kubernetes distro being used: AKS Kubernetes V 1.22.6
Other:

Steps to reproduce

  1. Create an AKS Cluster
  2. Run zarf init --components git-server.

Expected result

Command succeeds and Zarf is initialized in the cluster.

Actual Result

The following message repeats until the init run timesout
Deployment is not ready: zarf/zarf-docker-registry. 0 out of 1 expected pods are ready

output of kubectl -n zarf get events

LAST SEEN   TYPE      REASON              OBJECT                                       MESSAGE
15m         Normal    Scheduled           pod/injector                                 Successfully assigned zarf/injector to aks-agentpool-40722291-vmss000001
15m         Warning   FailedMount         pod/injector                                 MountVolume.SetUp failed for volume "zarf-payload-018" : failed to sync configmap cache: timed out waiting for the condition
15m         Warning   FailedMount         pod/injector                                 MountVolume.SetUp failed for volume "zarf-payload-023" : failed to sync configmap cache: timed out waiting for the condition
15m         Warning   FailedMount         pod/injector                                 MountVolume.SetUp failed for volume "zarf-payload-013" : failed to sync configmap cache: timed out waiting for the condition
15m         Warning   FailedMount         pod/injector                                 MountVolume.SetUp failed for volume "zarf-payload-008" : failed to sync configmap cache: timed out waiting for the condition
15m         Warning   FailedMount         pod/injector                                 MountVolume.SetUp failed for volume "zarf-payload-009" : failed to sync configmap cache: timed out waiting for the condition
15m         Warning   FailedMount         pod/injector                                 MountVolume.SetUp failed for volume "zarf-payload-019" : failed to sync configmap cache: timed out waiting for the condition
15m         Warning   FailedMount         pod/injector                                 MountVolume.SetUp failed for volume "stage1" : failed to sync configmap cache: timed out waiting for the condition
15m         Warning   FailedMount         pod/injector                                 MountVolume.SetUp failed for volume "zarf-payload-027" : failed to sync configmap cache: timed out waiting for the condition
15m         Warning   FailedMount         pod/injector                                 MountVolume.SetUp failed for volume "zarf-payload-007" : failed to sync configmap cache: timed out waiting for the condition
15m         Warning   FailedMount         pod/injector                                 (combined from similar events): MountVolume.SetUp failed for volume "zarf-payload-027" : failed to sync configmap cache: timed out waiting for the condition
15m         Normal    Scheduled           pod/zarf-docker-registry-789d8ddfb8-4pfgj    Successfully assigned zarf/zarf-docker-registry-789d8ddfb8-4pfgj to aks-agentpool-40722291-vmss000000
14m         Normal    Pulling             pod/zarf-docker-registry-789d8ddfb8-4pfgj    Pulling image "127.0.0.1:32178/library/registry:2.7.1"
14m         Warning   Failed              pod/zarf-docker-registry-789d8ddfb8-4pfgj    Failed to pull image "127.0.0.1:32178/library/registry:2.7.1": rpc error: code = Unknown desc = failed to pull and unpack image "127.0.0.1:32178/library/registry:2.7.1": failed to resolve reference "127.0.0.1:32178/library/registry:2.7.1": failed to do request: Head "https://127.0.0.1:32178/v2/library/registry/manifests/2.7.1": http: server gave HTTP response to HTTPS client
14m         Warning   Failed              pod/zarf-docker-registry-789d8ddfb8-4pfgj    Error: ErrImagePull
34s         Normal    BackOff             pod/zarf-docker-registry-789d8ddfb8-4pfgj    Back-off pulling image "127.0.0.1:32178/library/registry:2.7.1"
13m         Warning   Failed              pod/zarf-docker-registry-789d8ddfb8-4pfgj    Error: ImagePullBackOff
15m         Normal    SuccessfulCreate    replicaset/zarf-docker-registry-789d8ddfb8   Created pod: zarf-docker-registry-789d8ddfb8-4pfgj
15m         Normal    ScalingReplicaSet   deployment/zarf-docker-registry              Scaled up replica set zarf-docker-registry-789d8ddfb8 to 1

output of kubectl -n zarf get all

NAME                                        READY   STATUS             RESTARTS   AGE
pod/injector                                1/1     Running            0          40m
pod/zarf-docker-registry-789d8ddfb8-4pfgj   0/1     ImagePullBackOff   0          40m

NAME                           TYPE       CLUSTER-IP     EXTERNAL-IP   PORT(S)          AGE
service/zarf-docker-registry   NodePort   10.0.16.59     <none>        5000:31999/TCP   40m
service/zarf-injector          NodePort   10.0.144.122   <none>        5000:32178/TCP   40m

NAME                                   READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/zarf-docker-registry   0/1     1            0           40m

NAME                                              DESIRED   CURRENT   READY   AGE
replicaset.apps/zarf-docker-registry-789d8ddfb8   1         1         0       40m

Severity/Priority

😕 Blocked on deploying zarf packages to Azure AKS

@jeff-mccoy
Copy link
Member

We haven't run into this issue on AKS before, but it looks like the node container runtime is trying to call localhost via https vs http, which is the standard for containerd and crio. Is there any special config or other details about this provisioning that might change the container runtime by chance?

Would be helpful to run 'zarf destroy --confirm --remove-components' and then 'zarf init -l=trace'.

Sorry if markdown is weird, using the GitHub app right now.

@erikschlegel
Copy link
Author

erikschlegel commented Jul 7, 2022

Thanks for the response @jeff-mccoy. The strange thing is I provisioned a standard AKS cluster(v1.22.6) using the default settings from Azure, so nothing custom.

Here's the output from zarf init -l=trace

  DEBUG   Processing k8s manifest files /var/folders/bd/tlpzcs0s0dvgv7b89l96fn3r0000gn/T/zarf-1393582458/chart.yaml                                                                                                                
└ (/home/runner/work/zarf/zarf/src/internal/k8s/common.go:64)

  DEBUG   template.Apply({zarf-seed-registry  false true [] [{docker-registry  https://github.com/defenseunicorns/docker-registry.helm.git 2.1.0-zarf zarf [packages/zarf-registry/registry-values.yaml packages/zarf-registry/registry-values-seed.yaml] }] [] [] [] [] {false 0 false [] []} { }  map[] }, /var/folders/bd/tlpzcs0s0dvgv7b89l96fn3r0000gn/T/zarf-1393582458/chart.yaml)
└ (/home/runner/work/zarf/zarf/src/internal/template/template.go:73)

  DEBUG   map[GIT_AUTH_PULL:185e3dd5b35d633d615354ee GIT_AUTH_PUSH:f1d8c60b3ec4f99466124dab HTPASSWD:zarf-push:xxxx\nzarf-pull:$2a$10$NYDmZc5aDnU9EwSTD2SEz.7rWzO4xwAKgy0QPqD58nynmtrl5pTSu NODEPORT:31999 REGISTRY:127.0.0.1:31999 REGISTRY_AUTH_PULL:f278dfc43affbee63753e364d86c70f8aab81ee51f45c02b REGISTRY_AUTH_PUSH:6009132b6c664291254fbe76e305ed948defad0f7aafe9de REGISTRY_SECRET:xxx SEED_REGISTRY:127.0.0.1:31917 STORAGE_CLASS:]
└ (/home/runner/work/zarf/zarf/src/internal/template/template.go:112)
  DEBUG   [{/var/folders/bd/tlpzcs0s0dvgv7b89l96fn3r0000gn/T/zarf-1393582458/chart.yaml # Source: docker-registry/templates/secret.yaml                                                                                            
          apiVersion: v1
          kind: Secret
          metadata:
            name: zarf-docker-registry-secret
            namespace: zarf
            labels:
              app: docker-registry
              chart: docker-registry-2.1.0-zarf
              heritage: Helm
              release: zarf-docker-registry
          type: Opaque
          data:
            validateSecretValue: "xxxxx"
            configData: "xxxxx"
            htpasswd: xxxx/T/zarf-1393582458/chart.yaml # Source: docker-registry/templates/service.yaml
          apiVersion: v1
          kind: Service
          metadata:
            name: zarf-docker-registry
            namespace: zarf
            labels:
              app: docker-registry
              chart: docker-registry-2.1.0-zarf
              release: zarf-docker-registry
              heritage: Helm
          spec:
            type: NodePort
            ports:
              - port: 5000
                protocol: TCP
                name: http-5000
                targetPort: 5000
                nodePort: 31999
            selector:
              app: docker-registry
              release: zarf-docker-registry 0xc002547ad0} {/var/folders/bd/tlpzcs0s0dvgv7b89l96fn3r0000gn/T/zarf-1393582458/chart.yaml # Source: docker-registry/templates/deployment.yaml
          apiVersion: apps/v1
          kind: Deployment
          metadata:
            name: zarf-docker-registry
            namespace: zarf
            labels:
              app: docker-registry
              chart: docker-registry-2.1.0-zarf
              release: zarf-docker-registry
              heritage: Helm
          spec:
            selector:
              matchLabels:
                app: docker-registry
                release: zarf-docker-registry
            replicas: 1
            minReadySeconds: 5
            template:
              metadata:
                labels:
                  app: docker-registry
                  release: zarf-docker-registry
                annotations:
                  checksum/secret: xxxxx
              spec:
                imagePullSecrets:
                  - name: private-registry
                securityContext:
                  fsGroup: 1000
                  runAsUser: 1000
                containers:
                  - name: docker-registry
                    image: "127.0.0.1:31917/library/registry:2.7.1"
                    imagePullPolicy: IfNotPresent
                    command:
                    - /bin/registry
                    - serve
                    - /etc/docker/registry/config.yml
                    ports:
                      - containerPort: 5000
                    livenessProbe:
                      httpGet:
                        path: /
                        port: 5000
                    readinessProbe:
                      httpGet:
                        path: /
                        port: 5000
                    resources:
                      limits:
                        cpu: "3"
                        memory: 2Gi
                      requests:
                        cpu: 500m
                        memory: 256Mi
                    env:
                      - name: REGISTRY_AUTH
                        value: "htpasswd"
                      - name: REGISTRY_AUTH_HTPASSWD_REALM
                        value: "Registry Realm"
                      - name: REGISTRY_AUTH_HTPASSWD_PATH
                        value: "/etc/docker/registry/htpasswd"
                      - name: REGISTRY_STORAGE_FILESYSTEM_ROOTDIRECTORY
                        value: "/var/lib/registry"
                    volumeMounts:
                      - name: data
                        mountPath: /var/lib/registry/
                      - name: config
                        mountPath: "/etc/docker/registry"
                volumes:
                  - name: config
                    secret:
                      secretName: zarf-docker-registry-secret
                      items:
                      - key: configData
                        path: config.yml
                      - key: htpasswd
                        path: htpasswd
                  - name: data
                    emptyDir: {} 0xc002547f80}]
└ (/home/runner/work/zarf/zarf/src/internal/helm/post-render.go:73)
  DEBUG   k8s.getClientSet()                                                                                                                                                                                                       
└ (/home/runner/work/zarf/zarf/src/internal/k8s/common.go:158)
  DEBUG   k8s.getRestConfig()                                                                                                                                                                                                      
└ (/home/runner/work/zarf/zarf/src/internal/k8s/common.go:143)
  DEBUG   k8s.GenerateRegistryPullCreds(zarf, private-registry)                                                                                                                                                                    
└ (/home/runner/work/zarf/zarf/src/internal/k8s/secrets.go:55)

  DEBUG   k8s.GenerateSecret(zarf, private-registry)                                                                                                                                                                               
└ (/home/runner/work/zarf/zarf/src/internal/k8s/secrets.go:35)

  DEBUG   config.GetSecret(registry-pull)                                                                                                                                                                                          
└ (/home/runner/work/zarf/zarf/src/config/secret.go:38)

  DEBUG   k8s.getSecret(zarf, private-registry)                                                                                                                                                                                    
└ (/home/runner/work/zarf/zarf/src/internal/k8s/secrets.go:29)

  DEBUG   k8s.getClientSet()                                                                                                                                                                                                       
└ (/home/runner/work/zarf/zarf/src/internal/k8s/common.go:158)
  DEBUG   k8s.getRestConfig()                                                                                                                                                                                                      
└ (/home/runner/work/zarf/zarf/src/internal/k8s/common.go:143)
  DEBUG   k8s.ReplaceSecret(&Secret{ObjectMeta:{private-registry  zarf    0 0001-01-01 00:00:00 +0000 UTC <nil> <nil> map[app.kubernetes.io/managed-by:zarf] map[] [] []  []},Data:map[string][]byte{.dockerconfigjson: [123 34 97 117 116 104 115 34 58 123 34 49 50 55 46 48 46 48 46 49 58 51 49 57 57 57 34 58 123 34 97 117 116 104 34 58 34 101 109 70 121 90 105 49 119 100 87 120 115 79 109 89 121 78 122 104 107 90 109 77 48 77 50 70 109 90 109 74 108 90 84 89 122 78 122 85 122 90 84 77 50 78 71 81 52 78 109 77 51 77 71 89 52 89 87 70 105 79 68 70 108 90 84 85 120 90 106 81 49 89 122 65 121 89 103 61 61 34 125 125 125],},Type:kubernetes.io/dockerconfigjson,StringData:map[string]string{},Immutable:nil,})
└ (/home/runner/work/zarf/zarf/src/internal/k8s/secrets.go:115)

  DEBUG   k8s.CreateNamespace(zarf)                                                                                                                                                                                                
└ (/home/runner/work/zarf/zarf/src/internal/k8s/namespace.go:31)

  DEBUG   k8s.getClientSet()                                                                                                                                                                                                       
└ (/home/runner/work/zarf/zarf/src/internal/k8s/common.go:158)
  DEBUG   k8s.getRestConfig()                                                                                                                                                                                                      
└ (/home/runner/work/zarf/zarf/src/internal/k8s/common.go:143)
  DEBUG   &Namespace{ObjectMeta:{zarf    2ef9c292-56d0-4c81-bdce-1b88116eccbb 169820 0 2022-07-07 08:06:58 -0500 CDT <nil> <nil> map[app.kubernetes.io/managed-by:zarf kubernetes.io/metadata.name:zarf] map[] [] []  [{zarf Update v1 2022-07-07 08:06:58 -0500 CDT FieldsV1 {"f:metadata":{"f:labels":{".":{},"f:app.kubernetes.io/managed-by":{},"f:kubernetes.io/metadata.name":{}}}} }]},Spec:NamespaceSpec{Finalizers:[kubernetes],},Status:NamespaceStatus{Phase:Active,Conditions:[]NamespaceCondition{},},}
└ (/home/runner/work/zarf/zarf/src/internal/k8s/namespace.go:57)
  DEBUG   k8s.DeleteSecret(&Secret{ObjectMeta:{private-registry  zarf    0 0001-01-01 00:00:00 +0000 UTC <nil> <nil> map[app.kubernetes.io/managed-by:zarf] map[] [] []  []},Data:map[string][]byte{.dockerconfigjson: [123 34 97 117 116 104 115 34 58 123 34 49 50 55 46 48 46 48 46 49 58 51 49 57 57 57 34 58 123 34 97 117 116 104 34 58 34 101 109 70 121 90 105 49 119 100 87 120 115 79 109 89 121 78 122 104 107 90 109 77 48 77 50 70 109 90 109 74 108 90 84 89 122 78 122 85 122 90 84 77 50 78 71 81 52 78 109 77 51 77 71 89 52 89 87 70 105 79 68 70 108 90 84 85 120 90 106 81 49 89 122 65 121 89 103 61 61 34 125 125 125],},Type:kubernetes.io/dockerconfigjson,StringData:map[string]string{},Immutable:nil,})
└ (/home/runner/work/zarf/zarf/src/internal/k8s/secrets.go:129)

  DEBUG   k8s.getClientSet()                                                                                                                                                                                                       
└ (/home/runner/work/zarf/zarf/src/internal/k8s/common.go:158)
  DEBUG   k8s.getRestConfig()                                                                                                                                                                                                      
└ (/home/runner/work/zarf/zarf/src/internal/k8s/common.go:143)
  DEBUG   k8s.CreateSecret(&Secret{ObjectMeta:{private-registry  zarf    0 0001-01-01 00:00:00 +0000 UTC <nil> <nil> map[app.kubernetes.io/managed-by:zarf] map[] [] []  []},Data:map[string][]byte{.dockerconfigjson: [123 34 97 117 116 104 115 34 58 123 34 49 50 55 46 48 46 48 46 49 58 51 49 57 57 57 34 58 123 34 97 117 116 104 34 58 34 101 109 70 121 90 105 49 119 100 87 120 115 79 109 89 121 78 122 104 107 90 109 77 48 77 50 70 109 90 109 74 108 90 84 89 122 78 122 85 122 90 84 77 50 78 71 81 52 78 109 77 51 77 71 89 52 89 87 70 105 79 68 70 108 90 84 85 120 90 106 81 49 89 122 65 121 89 103 61 61 34 125 125 125],},Type:kubernetes.io/dockerconfigjson,StringData:map[string]string{},Immutable:nil,})
└ (/home/runner/work/zarf/zarf/src/internal/k8s/secrets.go:143)

  DEBUG   k8s.getClientSet()                                                                                                                                                                                                       
└ (/home/runner/work/zarf/zarf/src/internal/k8s/common.go:158)
  DEBUG   k8s.getRestConfig()                                                                                                                                                                                                      
└ (/home/runner/work/zarf/zarf/src/internal/k8s/common.go:143)
  DEBUG   k8s.GenerateSecret(zarf, private-git-server)                                                                                                                                                                             
└ (/home/runner/work/zarf/zarf/src/internal/k8s/secrets.go:35)

  DEBUG   config.GetSecret(git-pull)                                                                                                                                                                                               
└ (/home/runner/work/zarf/zarf/src/config/secret.go:38)

  DEBUG   k8s.ReplaceSecret(&Secret{ObjectMeta:{private-git-server  zarf    0 0001-01-01 00:00:00 +0000 UTC <nil> <nil> map[app.kubernetes.io/managed-by:zarf] map[] [] []  []},Data:map[string][]byte{},Type:Opaque,StringData:map[string]string{password: xxxxx,username: zarf-git-read-user,},Immutable:nil,})
└ (/home/runner/work/zarf/zarf/src/internal/k8s/secrets.go:115)

  DEBUG   k8s.CreateNamespace(zarf)                                                                                                                                                                                                
└ (/home/runner/work/zarf/zarf/src/internal/k8s/namespace.go:31)

  DEBUG   k8s.getClientSet()                                                                                                                                                                                                       
└ (/home/runner/work/zarf/zarf/src/internal/k8s/common.go:158)
  DEBUG   k8s.getRestConfig()                                                                                                                                                                                                      
└ (/home/runner/work/zarf/zarf/src/internal/k8s/common.go:143)
  DEBUG   &Namespace{ObjectMeta:{zarf    2ef9c292-56d0-4c81-bdce-1b88116eccbb 169820 0 2022-07-07 08:06:58 -0500 CDT <nil> <nil> map[app.kubernetes.io/managed-by:zarf kubernetes.io/metadata.name:zarf] map[] [] []  [{zarf Update v1 2022-07-07 08:06:58 -0500 CDT FieldsV1 {"f:metadata":{"f:labels":{".":{},"f:app.kubernetes.io/managed-by":{},"f:kubernetes.io/metadata.name":{}}}} }]},Spec:NamespaceSpec{Finalizers:[kubernetes],},Status:NamespaceStatus{Phase:Active,Conditions:[]NamespaceCondition{},},}
└ (/home/runner/work/zarf/zarf/src/internal/k8s/namespace.go:57)
  DEBUG   k8s.DeleteSecret(&Secret{ObjectMeta:{private-git-server  zarf    0 0001-01-01 00:00:00 +0000 UTC <nil> <nil> map[app.kubernetes.io/managed-by:zarf] map[] [] []  []},Data:map[string][]byte{},Type:Opaque,StringData:map[string]string{password: xxxxx,username: zarf-git-read-user,},Immutable:nil,})
└ (/home/runner/work/zarf/zarf/src/internal/k8s/secrets.go:129)

  DEBUG   k8s.getClientSet()                                                                                                                                                                                                       
└ (/home/runner/work/zarf/zarf/src/internal/k8s/common.go:158)
  DEBUG   k8s.getRestConfig()                                                                                                                                                                                                      
└ (/home/runner/work/zarf/zarf/src/internal/k8s/common.go:143)
  DEBUG   k8s.CreateSecret(&Secret{ObjectMeta:{private-git-server  zarf    0 0001-01-01 00:00:00 +0000 UTC <nil> <nil> map[app.kubernetes.io/managed-by:zarf] map[] [] []  []},Data:map[string][]byte{},Type:Opaque,StringData:map[string]string{password: xxxx,username: zarf-git-read-user,},Immutable:nil,})
└ (/home/runner/work/zarf/zarf/src/internal/k8s/secrets.go:143)

  DEBUG   k8s.getClientSet()                                                                                                                                                                                                       
└ (/home/runner/work/zarf/zarf/src/internal/k8s/common.go:158)
  DEBUG   k8s.getRestConfig()                                                                                                                                                                                                      
└ (/home/runner/work/zarf/zarf/src/internal/k8s/common.go:143)
  ⠋  Deployment is not ready: zarf/zarf-docker-registry. 0 out of 1 expected pods are ready (2m24s)    

@erikschlegel erikschlegel changed the title Zarf Registry Injector Fails on Init with Deployment is not ready: zarf/zarf-docker-registry error Zarf-Seed-Registry Installation Fails on Init with Deployment is not ready: zarf/zarf-docker-registry error Jul 7, 2022
@jeff-mccoy
Copy link
Member

Thanks @erikschlegel this definitely looks like a CRI change on AKS we'll need to play with a bit, I'll spin up AKS again this weekend to see if we can reproduce it. Are you provisioning AKS with IaC or via the Azure web interface?

@jeff-mccoy jeff-mccoy self-assigned this Jul 8, 2022
@erikschlegel
Copy link
Author

erikschlegel commented Jul 9, 2022

I'm provisioning the cluster directly through the Azure Portal. I confirmed that I was able to successfully initialize zarf using K8 version 1.21. I suspect this is a containerd issue as it's configured slightly different on AKS version 1.22+. This PR maybe worth checking out Azure/AgentBaker#1369

@JasonvanBrackel
Copy link
Contributor

@jeff-mccoy Any update on this?

@erikschlegel
Copy link
Author

Hi @jeff-mccoy - AKS K8 version 1.21 is no longer available for deployment in the portal and now none of the AKS supported versions appear to work with zarf. Do you happen to have an update?

@Uninstall4735
Copy link

Uninstall4735 commented Oct 31, 2022

Hello, I am experiencing the same issue. Here is my output from kubectl describe pods:

Events:
  Type     Reason     Age                From               Message
  ----     ------     ----               ----               -------
  Normal   Scheduled  48s                default-scheduler  Successfully assigned zarf/zarf-docker-registry-796648965f-2fjw5 to aks-demo-97972695-vmss000000
  Normal   BackOff    19s (x2 over 48s)  kubelet            Back-off pulling image "127.0.0.1:32380/library/registry:2.7.1"
  Warning  Failed     19s (x2 over 48s)  kubelet            Error: ImagePullBackOff
  Normal   Pulling    8s (x3 over 48s)   kubelet            Pulling image "127.0.0.1:32380/library/registry:2.7.1"
  Warning  Failed     8s (x3 over 48s)   kubelet            Failed to pull image "127.0.0.1:32380/library/registry:2.7.1": rpc error: code = Unknown desc = failed to pull and unpack image "1
27.0.0.1:32380/library/registry:2.7.1": failed to resolve reference "127.0.0.1:32380/library/registry:2.7.1": failed to do request: Head "https://127.0.0.1:32380/v2/library/registry/manifest
s/2.7.1": http: server gave HTTP response to HTTPS client
  Warning  Failed     8s (x3 over 48s)   kubelet            Error: ErrImagePull

version 21.2 in an Azure US Government AKS cluster

@dsmithbauer
Copy link
Contributor

I can confirm this is still an issue.

Hello, I am experiencing the same issue. Here is my output from kubectl describe pods:

Events:
  Type     Reason     Age                From               Message
  ----     ------     ----               ----               -------
  Normal   Scheduled  48s                default-scheduler  Successfully assigned zarf/zarf-docker-registry-796648965f-2fjw5 to aks-demo-97972695-vmss000000
  Normal   BackOff    19s (x2 over 48s)  kubelet            Back-off pulling image "127.0.0.1:32380/library/registry:2.7.1"
  Warning  Failed     19s (x2 over 48s)  kubelet            Error: ImagePullBackOff
  Normal   Pulling    8s (x3 over 48s)   kubelet            Pulling image "127.0.0.1:32380/library/registry:2.7.1"
  Warning  Failed     8s (x3 over 48s)   kubelet            Failed to pull image "127.0.0.1:32380/library/registry:2.7.1": rpc error: code = Unknown desc = failed to pull and unpack image "1
27.0.0.1:32380/library/registry:2.7.1": failed to resolve reference "127.0.0.1:32380/library/registry:2.7.1": failed to do request: Head "https://127.0.0.1:32380/v2/library/registry/manifest
s/2.7.1": http: server gave HTTP response to HTTPS client
  Warning  Failed     8s (x3 over 48s)   kubelet            Error: ErrImagePull

version 21.2 in an Azure US Government AKS cluster

@jeff-mccoy
Copy link
Member

Yeah they must be doing something special, containerd upstream still serves localhost on http and even tests for it, digging into this more this week: https://github.com/containerd/containerd/blob/main/pkg/cri/server/image_pull_test.go

@Racer159
Copy link
Contributor

Racer159 commented Nov 2, 2022

While we work on this, as a note on a potential work around, you can also use an external registry as described here: https://docs.zarf.dev/docs/user-guide/the-zarf-cli/cli-commands/zarf_init

Under the Integrations tab in the Azure console you can tie an Azure Container Registry to your cluster and then init it with something like this:
zarf init --registry-push-password={PASSWORD} --registry-push-username={USERNAME} --registry-url={REGISTRY}.azurecr.io

(you can also specify a separate --registry-pull-password and --registry-pull-username and can load the username(s)/password(s) from a zarf-config.toml as described here: https://docs.zarf.dev/docs/user-guide/the-zarf-cli/cli-commands/zarf_prepare_generate-config#synopsis)

(also note you will need to be on v0.22.1 or higher)

@jeff-mccoy
Copy link
Member

Put some notes in a new issue after digging around a bit. Will try to test an older version of AKS later on tonight.

@jeff-mccoy
Copy link
Member

Also tested on AKS 1.22.11 and seeing the same results.

@jeff-mccoy
Copy link
Member

Added some new notes at Azure/AKS#3303 (comment).

Looks like a bug in containerd that was patched 2 weeks ago. In the interim, Acorn ran into this issue too and did what we've been trying to avoid (modify the containerd config). Root cause is the change @erikschlegel identified to allow containerd registry config overrides in AKS actually highlighted the underlying containerd issue.

@jeff-mccoy
Copy link
Member

@cheruvu1 if you have any other data you'd like to drop on this issue, please leave it here. Thanks!

@jsburckhardt
Copy link
Contributor

jsburckhardt commented Nov 30, 2022

hi folks, as a workaround (that includes patch by iceberg you can update containerd in your cluster.
Just apt update and upgrade the node. I do it through a DaemonSet (example):

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: update-cluster
  labels:
    app: update-cluster
spec:
  selector:
    matchLabels:
      app: update-cluster
  template:
    metadata:
      labels:
        app: update-cluster
    spec:
      containers:
      - name: update-cluster
        image: alpine
        imagePullPolicy: IfNotPresent
        command:
          - nsenter
          - --target
          - "1"
          - --mount
          - --uts
          - --ipc
          - --net
          - --pid
          - --
          - sh
          - -c
          - |
            # apt update and upgrade 
            export DEBIAN_FRONTEND=noninteractive apt update && apt upgrade -y
            sleep infinity
        securityContext:
          privileged: true
      dnsPolicy: ClusterFirst
      hostPID: true

image

I was able to initialize zarf:
image

I did also deploy the Big Bang into AKS, but had a "bump"

  1. gatekeeper has the label control-plane: controller-manager in their NS... preventing the hook to change the image ... after removing it, everything went smoothly. -> in rke2 didn't get the problem ...
    image

@brandtkeller
Copy link
Member

Incurred this problem while attempting to initialize zarf on a Nutanix kubernetes cluster running v1.22.9 of kubernetes and 1.6.6 of containerd.

@ntwkninja
Copy link
Member

ntwkninja commented Jan 31, 2023

encountered the same on EKS v1.24 using the v0.24.0-rc3 binary and init package
image

@ntwkninja
Copy link
Member

EKS v1.23 works without issue because it is still using docker vs. containerd

@jeff-mccoy
Copy link
Member

Tracking EKS AMI containerd update: awslabs/amazon-eks-ami#1162

@jsburckhardt
Copy link
Contributor

jsburckhardt commented Feb 24, 2023

quick update, containerd was updated and I could deploy bigbang 1.48.0.
image

@ntwkninja
Copy link
Member

Tracking EKS AMI containerd update: awslabs/amazon-eks-ami#1162

Upstream issue has been closed and @brianrexrode has successfully tested zarf v0.25.2 with EKS V1.26

@Racer159 Racer159 self-assigned this Apr 18, 2023
@Racer159 Racer159 assigned jeff-mccoy and unassigned jeff-mccoy Apr 18, 2023
@Racer159 Racer159 assigned jeff-mccoy and unassigned Racer159 and jeff-mccoy Feb 2, 2024
@AbrohamLincoln
Copy link
Contributor

I've run into this with containerd > 1.6.25.
It seems the default behavior changed here containerd/containerd#9300

I've been commenting out the following lines in the containerd config.toml to work around it:

# [plugins."io.containerd.grpc.v1.cri".registry]
# config_path = "/etc/containerd/certs.d:/etc/docker/certs.d"

@Racer159
Copy link
Contributor

Racer159 commented Feb 7, 2024

@AbrohamLincoln thanks for the note! We're looking at exploring other options too, and for others this will affect newer versions of containerd 1.7 (>=1.7.7) (and 2.0 if anyone is on the betas) as well.

containerd/containerd#9299

@Racer159 Racer159 pinned this issue Feb 8, 2024
@Jdavid77
Copy link

Jdavid77 commented Mar 1, 2024

Having this issue also with AKS k8s 1.28.3 version , any updates regarding this??

@philiversen
Copy link

Seeing similar behavior on EKS 1.26 - 1.29 all of which are using containerd 1.7.11. Zarf init was previously (late January) working on EKS 1.26 with containerd 1.7.2. Confirmed that going back to EKS 1.23 with docker runtime 20.10.25 does work, but that version is out of service.

@jnt2007
Copy link

jnt2007 commented Mar 15, 2024

Hi guys. Any plans here? Still having a problem with k8s 1.27+ versions with containerd 1.7.11.

Any recommendations from the community on how we can "tune" a containerd config to avoid this issue?

@philiversen
Copy link

Hi guys. Any plans here? Still having a problem with k8s 1.27+ versions with containerd 1.7.11.

Any recommendations from the community on how we can "tune" a containerd config to avoid this issue?

Commenting out the containerd config lines mentioned in this post got things working again for me.

@mjnagel
Copy link
Contributor

mjnagel commented Mar 20, 2024

Also ran into this issue on newer RKE2 versions. It seems linked back to this commit which introduced the config_path line. The containerd update to 1.7.7+ actually did not immediately cause the issue (RKE2 1.29.0 works and is on containerd 1.7.11), it was specifically the introduction of that containerd config line 👀.

Just for reference affected versions of k3s/rke2 appear to be 1.29.1+, 1.28.6+, and 1.27.10+. Definitely curious if there is anything to address this on the zarf side or if this should make its way into the docs as a recommended pre-req/setup for the cluster?

@jeff-mccoy
Copy link
Member

All things coming around, we may need to look to a way to avoid the localhost/http behavior since containerd has introduced bugs multiple times for this in the past year or so. containerd/containerd#9188

@lucasrod16
Copy link
Member

A new issue has been opened against containerd to address this: containerd/containerd#10014

@JasonRodriguez1474
Copy link

Hi guys. Any plans here? Still having a problem with k8s 1.27+ versions with containerd 1.7.11.
Any recommendations from the community on how we can "tune" a containerd config to avoid this issue?

Commenting out the containerd config lines mentioned in this post got things working again for me.

Does anyone know if this config line fix is in the default zarf init version of k3s?
I've been playing around with this issue and am observing the ImagePullBackOff for the zarf-docker-registry in various versions of containerd when I connect to a cluster I started on my own, but not when I use the K3S that Zarf initiates as part of zarf init.

Kubernetes Version Container-Runtime CR version zarf-docker-registry init Zarf K3S
v1.28.8+k3s1 containerd 1.7.10 Fails No, Rancher Desktop
v1.28.8+k3s1 docker 24.0.7 Works No, Rancher Desktop
v1.28.8+k3s1 containerd 1.7.11-k3s2 Fails No, standard K3S install
v1.28.4+k3s2 containerd 1.7.7-k3s1 Works Yes

Error message observed when it fails

err="failed to \"StartContainer\" for \"docker-registry\" with ImagePullBackOff: \"Back-off pulling image \\\"127.0.0.1:31585/library/registry:2.8.3\\\"\"" pod="zarf/zarf-docker-registry-6c9d98c95d-z7k6r" 

@lucasrod16
Copy link
Member

A fix has been merged in and backported to 1.7 and pending 1.6 for containerd: containerd/containerd#10109

I have not tested the fix myself yet, but hoping it resolves this issue 🤞

@lucasrod16
Copy link
Member

Containerd fix has been released in v1.7.16: https://github.com/containerd/containerd/releases/tag/v1.7.16

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Status: No status
Development

Successfully merging a pull request may close this issue.