Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Windows Node cannot pull images from private registry #7103

Open
luisxkimo opened this issue Oct 23, 2024 · 12 comments
Open

Windows Node cannot pull images from private registry #7103

luisxkimo opened this issue Oct 23, 2024 · 12 comments
Assignees

Comments

@luisxkimo
Copy link

Environmental Info:
RKE2 Version:
rke2 version v1.30.5+rke2r1 (0c83bc8)
go version go1.22.6 X:boringcrypto

Node(s) CPU architecture, OS, and Version:
Windows Server 2022 21H2 Build 20348.2700

Cluster Configuration:
2 Managers and 2 workers in RHEL all of them

Describe the bug:
Windows Node cannot pull images from private registry.

Steps To Reproduce:

  • Create a pod using private registry in the name of the image

Expected behavior:
Image is downloaded and pod is running

Actual behavior:
Pod is not running with an error pulling the container image

Additional context / logs:
Here is an example of "describe" log of the pod:

Events:
  Type     Reason     Age   From               Message
  ----     ------     ----  ----               -------
  Normal   Scheduled  3s    default-scheduler  Successfully assigned kube-system/csi-proxy-mzlvg to win-node.internal.ad.com
  Normal   Pulling    2s    kubelet            Pulling image "ghcr.internal-cache.com/kubernetes-sigs/sig-windows/csi-proxy:v1.1.2"
  Warning  Failed     2s    kubelet            Failed to pull image "ghcr.internal-cache.com/kubernetes-sigs/sig-windows/csi-proxy:v1.1.2": failed to pull and unpack image "ghcr.internal-cache.com/kubernetes-sigs/sig-windows/csi-proxy:v1.1.2": failed to resolve reference "ghcr.internal-cache.com/kubernetes-sigs/sig-windows/csi-proxy:v1.1.2": unexpected status from HEAD request to https://ghcr.internal-cache.com/v2/kubernetes-sigs/sig-windows/csi-proxy/manifests/v1.1.2: 403
  Warning  Failed     2s    kubelet            Error: ErrImagePull
  Normal   BackOff    2s    kubelet            Back-off pulling image "ghcr.internal-cache.com/kubernetes-sigs/sig-windows/csi-proxy:v1.1.2"

Here is the configuration of the file C:\etc\rancher\rke2\registries.yaml:

configs:
  docker.internal-cache.com:
    auth:
      password: thPassword
      username: username2
  ghcr.internal-cache.com:
    auth:
      password: ghcrpassWord
      username: usernameGHCR
@brandond
Copy link
Member

brandond commented Oct 23, 2024

Are you sure the credentials are correct? Are you sure that the image exists on that registry?

Can you pull that image successfully if you do ctr -n k8s.io image pull --user USER:PASSWORD ghcr.internal-cache.com/kubernetes-sigs/sig-windows/csi-proxy:v1.1.2 ?

Assuming the tag exists and the creds are correct, you might also check containerd.log to see if it contains any more useful information on why the pull is failing.

@luisxkimo
Copy link
Author

Hi @brandond ,

Yes, the command to pull manually the image works fine using the same credentials on the registries.yaml file.

I can't find any kind of containerd.log inside C:\var\log and subfolders, but in any case, seeing the error with "403" in the events of the pod, I guess that the issue is related with the credentials that is trying to use.

Maybe the issue is that I haven't the right registries.yaml file or are in wrong path. Actually is C:\etc\rancher\rke2\registries.yaml

@brandond
Copy link
Member

brandond commented Oct 24, 2024

That should be the right path. Are there any errors in the system log regarding the contents of that file? Do you see the registries and creds in c:/var/lib/rancher/rke2/agent/etc/containerd/config.toml?

@luisxkimo
Copy link
Author

Yes, I can see a section on this config.toml like:

[plugins."io.containerd.grpc.v1.cri".registry]
      config_path = "C:\\var\\lib\\rancher\\rke2\\agent\\etc\\containerd\\certs.d"
      
        [plugins."io.containerd.grpc.v1.cri".registry.configs.auth."docker.internal-cache.com"]
          username = "username2"
          password = "thPassword"
     
        [plugins."io.containerd.grpc.v1.cri".registry.configs.auth."ghcr.internal-cache.com"]
          username = "usernameGHCR"
          password = "ghcrpassWord"

@brandond
Copy link
Member

That's correct then, and all that K3s is responsible for managing. Take a look at the containerd.log (also under the rke2 agent dir) and see what that says.

@brandond
Copy link
Member

brandond commented Oct 25, 2024

You can confirm that you're pulling images from docker.internal-cache.com? It doesn't have a port when referenced in the image name, or something else that would make a string comparison fail?

@luisxkimo
Copy link
Author

Hi,

Yes I confirmed that the image is downloaded from the cache.

Anyway, we finally abandon the approach of use Windows node because these kind of issues and other related with csi plugins.

So I will close the thread without a real solution of the original issue but we cannot continue tracking this by now.

Thanks a lot @brandond for your support.

@HarrisonWAffel
Copy link
Contributor

HarrisonWAffel commented Jan 30, 2025

I think this ticket should be reopened, or I can create a new one.

It looks the issue is the template used by k3s to render the config.toml file onto the node is using invalid syntax when setting up the credentials. I was able to reproduce this issue and resolved it by creating a config.toml.tmpl file that is nearly identical to the one shipped in k3s, with the only change being where .auth is placed

[plugins."io.containerd.grpc.v1.cri".registry.configs.auth."{{$k}}"]

To

[plugins."io.containerd.grpc.v1.cri".registry.configs."{{$k}}".auth]

plugins."io.containerd.snapshotter.v1.stargz".registry.configs."{{$k}}".auth

This matches linux and properly pulls images from an authenticated registry. Looking through git blame it seems that this section of the template hasn't been updated since it was first added. I don't see any mention of this use case in rancher/rancher or any of the windows repositories, so I'm not sure if authenticated registries were ever tested properly on Windows (though I could be wrong, not sure if rke2/k3s team tested this out at some point).

SURE-9200

@brandond
Copy link
Member

brandond commented Jan 30, 2025

@HarrisonWAffel In the linux template the auths are in there twice, one for cri and one for stargz. Windows doesn't support stargz snapshotter so they're only in there once, for cri.

Windows:

Linux:

Note that stargz is not enabled by default, you have to start the node with --snapshotter=stargz for that section to be used at all.

So I don't think that's related to your problem.

@HarrisonWAffel
Copy link
Contributor

Oh my bad that's a copy paste error on my part, I've updated my comment. In my test I did properly use plugins."io.containerd.grpc.v1.cri".registry.configs, with the only change being the placement of .auth

@brandond
Copy link
Member

brandond commented Jan 30, 2025

Ah ok. The whole windows template is hot garbage to begin with. 90% of it is unnecessary copy-pasted defaults that don't even need to be explicitly set. Like most of the initial windows work, it was done in a rush by a hotshot team that's no longer with the company, and we haven't had time to undo all of it yet.

I am doing away with split templates as part of the containerd 2.0 (config version 3) upgrade, I might take a look at consolidating the containerd 1.7 (config version 2) templates in a separate PR. Ref: k3s-io/k3s#11626 (comment)

@brandond
Copy link
Member

brandond commented Jan 30, 2025

@brandond brandond reopened this Jan 30, 2025
@brandond brandond self-assigned this Jan 30, 2025
@brandond brandond added this to the 2025-02 Release Cycle milestone Jan 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants