Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DNS timeout in Windows #7381

Closed
rabejens opened this issue Dec 3, 2024 · 3 comments
Closed

DNS timeout in Windows #7381

rabejens opened this issue Dec 3, 2024 · 3 comments

Comments

@rabejens
Copy link

rabejens commented Dec 3, 2024

Environmental Info:
RKE2 Version:

Linux:

rke2 version v1.30.6+rke2r1 (2959cd2193af9ed18d0fc2912fc5c11d6462103d)                                                                                                              go version go1.22.8 X:boringcrypto      

WIndows:

rke2.exe version v1.30.6+rke2r1 (2959cd2193af9ed18d0fc2912fc5c11d6462103d)
go version go1.22.8

Node(s) CPU architecture, OS, and Version:

Linux controlplane 6.8.0-49-generic #49-Ubuntu SMP PREEMPT_DYNAMIC Mon Nov  4 02:06:24 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

The Windows node is running Windows Server 2022.

Cluster Configuration:

NAME              STATUS   ROLES                       AGE   VERSION
controlplane      Ready    control-plane,etcd,master   11h   v1.30.6+rke2r1
linux-agent01     Ready    <none>                      11h   v1.30.6+rke2r1
linux-agent02     Ready    <none>                      10h   v1.30.6+rke2r1
linux-agent03     Ready    <none>                      10h   v1.30.6+rke2r1
windows-agent01   Ready    <none>                      10h   v1.30.6

Describe the bug:
When trying to do DNS from a Windows container, all DNS requests time out.

Steps To Reproduce:

  • Install RKE2 with Flannel
curl -sfL https://get.rke2.io | sh -
  • Modify /usr/local/lib/systemd/system/rke2-server.service so that the ExecStart line says: ExecStart=/usr/local/bin/rke2 server --cni flannel
  • Enable and start RKE2 server service as in Quick Start
  • Install Linux agent as in Quick Start
  • Install Windows agent as in Quick Start
  • Disable Windows Firewall
  • Deploy two services:
apiVersion: v1
kind: Service
metadata:
  name: simple-server
  namespace: default
spec:
  selector:
    app: simple-server
  ports:
  - port: 80
    targetPort: 80
  type: ClusterIP
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: simple-server
  namespace: default
spec:
  selector:
    matchLabels:
      app: simple-server
  template:
    metadata:
      labels:
        app: simple-server
    spec:
      containers:
      - name: simple-server
        image: nginx:1.27.3
        resources:
          limits:
            memory: "128Mi"
            cpu: "500m"
        ports:
        - containerPort: 80
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: helloworld
  namespace: default
  labels:
    app: helloworld
spec:
  replicas: 1
  template:
    metadata:
      name: helloworld
      labels:
        app: helloworld
    spec:
      tolerations:
        - key: kubernetes.io/os
          operator: Equal
          value: windows
          effect: NoSchedule
      nodeSelector:
        "kubernetes.io/os": windows
      containers:
      - name: helloworld
        image: mcr.microsoft.com/dotnet/framework/samples:aspnetapp
        imagePullPolicy: Always
        resources:
          limits:
            cpu: "1"
            memory: 1Gi
        ports:
          - containerPort: 80
  selector:
    matchLabels:
      app: helloworld
---
apiVersion: v1
kind: Service
metadata:
  name: helloworld
  namespace: default
  labels:
    app: helloworld
spec:
  type: ClusterIP
  ports:
  - port: 80
  selector:
    app: helloworld
  • Get a shell on the Windows container
kubectl exec -ti deploy/helloworld -- cmd
  • Do a nslookup
nslookup simple-server

Expected behavior:
The in-cluster IP address of simple-server should be resolved

Actual behavior:

DNS request timed out.
    timeout was 2 seconds.
Server:  UnKnown
Address:  10.43.0.10

Additional context / logs:
Running RKE2 in the foreground on Windows doesn't yield anything special:

time="2024-12-03T08:37:47Z" level=info msg="Starting rke2 agent v1.30.6+rke2r1 (2959cd2193af9ed18d0fc2912fc5c11d6462103d)"
time="2024-12-03T08:37:47Z" level=info msg="Adding server to load balancer rke2-agent-load-balancer: controlplane.clustertest.hdk:9345"
time="2024-12-03T08:37:47Z" level=info msg="Adding server to load balancer rke2-agent-load-balancer: 192.168.49.1:9345"
time="2024-12-03T08:37:47Z" level=info msg="Removing server from load balancer rke2-agent-load-balancer: controlplane.clustertest.hdk:9345"
time="2024-12-03T08:37:47Z" level=info msg="Running load balancer rke2-agent-load-balancer 127.0.0.1:6444 -> [192.168.49.1:9345] [default: controlplane.clustertest.hdk:9345]"
time="2024-12-03T08:37:47Z" level=info msg="Adding server to load balancer rke2-api-server-agent-load-balancer: controlplane.clustertest.hdk:6443"
time="2024-12-03T08:37:47Z" level=info msg="Adding server to load balancer rke2-api-server-agent-load-balancer: 192.168.49.1:6443"
time="2024-12-03T08:37:47Z" level=info msg="Removing server from load balancer rke2-api-server-agent-load-balancer: controlplane.clustertest.hdk:6443"
time="2024-12-03T08:37:47Z" level=info msg="Running load balancer rke2-api-server-agent-load-balancer 127.0.0.1:6443 -> [192.168.49.1:6443] [default: controlplane.clustertest.hdk:6443]"
time="2024-12-03T08:37:48Z" level=warning msg="Host resolv.conf includes loopback or multicast nameservers - kubelet will use autogenerated resolv.conf with nameserver 8.8.8.8"
time="2024-12-03T08:37:48Z" level=info msg="Runtime image index.docker.io/rancher/rke2-runtime:v1.30.6-rke2r1-windows-amd64 bin and charts directories already exist; skipping extract"
time="2024-12-03T08:37:48Z" level=info msg="Setting up Flannel CNI"
time="2024-12-03T08:37:48Z" level=info msg="Flannel required config files ready"
time="2024-12-03T08:37:48Z" level=info msg="Windows bootstrap okay. Exiting setup."
time="2024-12-03T08:37:48Z" level=info msg="Logging containerd to C:\\var\\lib\\rancher\\rke2\\agent\\containerd\\containerd.log"
time="2024-12-03T08:37:48Z" level=info msg="Running containerd -c C:\\var\\lib\\rancher\\rke2\\agent\\etc\\containerd\\config.toml"
time="2024-12-03T08:37:49Z" level=info msg="containerd is now running"
time="2024-12-03T08:37:49Z" level=info msg="Pulling images from C:\\var\\lib\\rancher\\rke2\\agent\\images\\runtime-image.txt"
time="2024-12-03T08:37:49Z" level=info msg="Pulling image index.docker.io/rancher/rke2-runtime:v1.30.6-rke2r1-windows-amd64"
time="2024-12-03T08:37:54Z" level=error msg="Error encountered while importing C:\\var\\lib\\rancher\\rke2\\agent\\images\\runtime-image.txt: failed to pull images from C:\\var\\lib\\rancher\\rke2\\agent\\images\\runtime-image.txt: rpc error: code = Unknown desc = failed to pull and unpack image \"docker.io/rancher/rke2-runtime:v1.30.6-rke2r1-windows-amd64\": failed to extract layer sha256:dbfb9330c9a9cc4ccb1aa1592931f1efe0d7cf794aa6f5fadc25a50944d54736: hcsshim::ProcessBaseLayer \\\\?\\C:\\var\\lib\\rancher\\rke2\\agent\\containerd\\io.containerd.snapshotter.v1.windows\\snapshots\\21: The system cannot find the path specified.: unknown"
time="2024-12-03T08:37:54Z" level=info msg="Getting list of apiserver endpoints from server"
time="2024-12-03T08:37:54Z" level=info msg="Got apiserver addresses from supervisor: [192.168.49.1:6443]"
time="2024-12-03T08:37:54Z" level=info msg="Connecting to proxy" url="wss://192.168.49.1:9345/v1-rke2/connect"
time="2024-12-03T08:37:54Z" level=info msg="Creating rke2-cert-monitor event broadcaster"
time="2024-12-03T08:37:54Z" level=info msg="Running kubelet --address=0.0.0.0 --alsologtostderr=false --anonymous-auth=false --authentication-token-webhook=true --authorization-mode=Webhook --client-ca-file=C:\\var\\lib\\rancher\\rke2\\agent\\client-ca.crt --cloud-provider=external --cluster-dns=10.43.0.10 --cluster-domain=cluster.local --container-runtime-endpoint=npipe:////./pipe/containerd-containerd --eviction-hard=imagefs.available<5%,nodefs.available<5% --eviction-minimum-reclaim=imagefs.available=10%,nodefs.available=10% --fail-swap-on=false --feature-gates=CloudDualStackNodeIPs=true --healthz-bind-address=127.0.0.1 --hostname-override=windows-agent01 --kubeconfig=C:\\var\\lib\\rancher\\rke2\\agent\\kubelet.kubeconfig --log-file=\\var\\lib\\rancher\\rke2\\agent\\logs\\kubelet.log --log-file-max-size=50 --logtostderr=false --node-ip=192.168.51.1 --node-labels= --pod-manifest-path=C:\\var\\lib\\rancher\\rke2\\agent\\pod-manifests --read-only-port=0 --resolv-conf=C:\\var\\lib\\rancher\\rke2\\agent\\etc\\resolv.conf --serialize-image-pulls=false --stderrthreshold=FATAL --tls-cert-file=C:\\var\\lib\\rancher\\rke2\\agent\\serving-kubelet.crt --tls-private-key-file=C:\\var\\lib\\rancher\\rke2\\agent\\serving-kubelet.key"
time="2024-12-03T08:37:54Z" level=info msg="Running RKE2 kubelet [--cgroups-per-qos=false --enforce-node-allocatable= --file-check-frequency=5s --hairpin-mode=promiscuous-bridge --resolv-conf= --sync-frequency=30s --address=0.0.0.0 --anonymous-auth=false --authentication-token-webhook=true --authorization-mode=Webhook --client-ca-file=C:\\var\\lib\\rancher\\rke2\\agent\\client-ca.crt --cloud-provider=external --cluster-dns=10.43.0.10 --cluster-domain=cluster.local --container-runtime-endpoint=npipe:////./pipe/containerd-containerd --fail-swap-on=false --feature-gates=CloudDualStackNodeIPs=true --healthz-bind-address=127.0.0.1 --hostname-override=windows-agent01 --kubeconfig=C:\\var\\lib\\rancher\\rke2\\agent\\kubelet.kubeconfig --node-ip=192.168.51.1 --node-labels= --pod-manifest-path=C:\\var\\lib\\rancher\\rke2\\agent\\pod-manifests --read-only-port=0 --resolv-conf=C:\\var\\lib\\rancher\\rke2\\agent\\etc\\resolv.conf --serialize-image-pulls=false --tls-cert-file=C:\\var\\lib\\rancher\\rke2\\agent\\serving-kubelet.crt --tls-private-key-file=C:\\var\\lib\\rancher\\rke2\\agent\\serving-kubelet.key]"
time="2024-12-03T08:37:54Z" level=info msg="Node windows-agent01 registered. Flanneld can start"
time="2024-12-03T08:37:54Z" level=info msg="Flanneld Envs: [NODE_NAME=windows-agent01 PATH=C:\\var\\lib\\rancher\\rke2\\data\\v1.30.6-rke2r1-windows-amd64-70dc85671177\\bin;C:\\Windows\\system32;C:\\Windows;C:\\Windows\\System32\\Wbem;C:\\Windows\\System32\\WindowsPowerShell\\v1.0\\;C:\\Windows\\System32\\OpenSSH\\;c:\\var\\lib\\rancher\\rke2\\bin;c:\\usr\\local\\bin;C:\\Users\\Administrator\\AppData\\Local\\Microsoft\\WindowsApps;] and args: [--kubeconfig-file=c:\\var\\lib\\rancher\\rke2\\agent\\flannel.kubeconfig --ip-masq --kube-subnet-mgr --iptables-forward-rules=false --iface=192.168.51.1 --net-config-path=c:\\var\\lib\\rancher\\rke2\\agent\\flanneld-net-conf.json]"
time="2024-12-03T08:37:54Z" level=info msg="Remotedialer connected to proxy" url="wss://192.168.49.1:9345/v1-rke2/connect"
time="2024-12-03T08:37:54Z" level=info msg="Running kube-proxy --cluster-cidr=10.42.0.0/16 --healthz-bind-address=127.0.0.1 --hostname-override=windows-agent01 --kubeconfig=C:\\var\\lib\\rancher\\rke2\\agent\\kubeproxy.kubeconfig --proxy-mode=kernelspace"
time="2024-12-03T08:37:54Z" level=info msg="Reserving an IP on flannel HNS network for kube-proxy source vip"
time="2024-12-03T08:37:54Z" level=info msg="Flannel HNS network ready with managementIP: 192.168.51.1"
time="2024-12-03T08:37:54Z" level=info msg="Annotations and labels have been set successfully on node: windows-agent01"
time="2024-12-03T08:37:54Z" level=info msg="Source VIP for kube-proxy was already reserved [10.42.4.2]"
time="2024-12-03T08:37:54Z" level=info msg="Reserved VIP for kube-proxy: 10.42.4.2"
time="2024-12-03T08:37:54Z" level=info msg="HCN feature check" supportedFeatures="{{true true true true} {true true} true true true true true true true true true true true false false false false false}" version="{13 3}"
time="2024-12-03T08:37:54Z" level=info msg="WinDSR support is enabled"
time="2024-12-03T08:37:54Z" level=info msg="Running RKE2 kube-proxy [--bind-address=192.168.51.1 --enable-dsr=true --feature-gates=WinDSR=true --network-name=flannel.4096 --source-vip=10.42.4.2 --cluster-cidr=10.42.0.0/16 --healthz-bind-address=127.0.0.1 --hostname-override=windows-agent01 --kubeconfig=C:\\var\\lib\\rancher\\rke2\\agent\\kubeproxy.kubeconfig --proxy-mode=kernelspace]"
time="2024-12-03T08:37:58Z" level=info msg="Tunnel authorizer set Kubelet Port 0.0.0.0:10250"

Flanneld logs:

I1203 08:37:55.032318    4188 main.go:211] CLI flags config: {etcdEndpoints:http://127.0.0.1:4001,http://127.0.0.1:2379 etcdPrefix:/coreos.com/network etcdKeyfile: etcdCertfile: etcdCAFile: etcdUsername: etcdPassword: version:false kubeSubnetMgr:true kubeApiUrl: kubeAnnotationPrefix:flannel.alpha.coreos.com kubeConfigFile:c:\var\lib\rancher\rke2\agent\flannel.kubeconfig iface:[192.168.51.1] ifaceRegex:[] ipMasq:true ifaceCanReach: subnetFile:/run/flannel/subnet.env publicIP: publicIPv6: subnetLeaseRenewMargin:60 healthzIP:0.0.0.0 healthzPort:0 iptablesResyncSeconds:5 iptablesForwardRules:false netConfPath:c:\var\lib\rancher\rke2\agent\flanneld-net-conf.json setNodeNetworkUnavailable:true}
I1203 08:37:55.176105    4188 kube.go:139] Waiting 10m0s for node controller to sync
I1203 08:37:55.176105    4188 kube.go:469] Starting kube subnet manager
I1203 08:37:55.244661    4188 kube.go:490] Creating the node lease for IPv4. This is the n.Spec.PodCIDRs: [10.42.0.0/24]
I1203 08:37:55.244661    4188 kube.go:490] Creating the node lease for IPv4. This is the n.Spec.PodCIDRs: [10.42.1.0/24]
I1203 08:37:55.244661    4188 kube.go:490] Creating the node lease for IPv4. This is the n.Spec.PodCIDRs: [10.42.2.0/24]
I1203 08:37:55.244661    4188 kube.go:490] Creating the node lease for IPv4. This is the n.Spec.PodCIDRs: [10.42.3.0/24]
I1203 08:37:55.244661    4188 kube.go:490] Creating the node lease for IPv4. This is the n.Spec.PodCIDRs: [10.42.5.0/24]
I1203 08:37:56.191306    4188 kube.go:146] Node controller sync successful
I1203 08:37:56.191306    4188 main.go:231] Created subnet manager: Kubernetes Subnet Manager - windows-agent01
I1203 08:37:56.191306    4188 main.go:234] Installing signal handlers
I1203 08:37:56.191306    4188 main.go:452] Found network config - Backend type: vxlan
I1203 08:37:56.194124    4188 kube.go:669] List of node(windows-agent01) annotations: map[string]string{"alpha.kubernetes.io/provided-node-ip":"192.168.51.1", "flannel.alpha.coreos.com/backend-data":"{\"VNI\":4096,\"VtepMAC\":\"00:15:5d:4d:e9:d7\"}", "flannel.alpha.coreos.com/backend-type":"vxlan", "flannel.alpha.coreos.com/kube-subnet-manager":"true", "flannel.alpha.coreos.com/public-ip":"192.168.51.1", "node.alpha.kubernetes.io/ttl":"0", "rke2.io/hostname":"windows-agent01", "rke2.io/internal-ip":"192.168.51.1", "rke2.io/node-args":"[\"agent\",\"--server\",\"https://controlplane.clustertest.hdk:9345\",\"--token\",\"********\",\"--server\",\"https://controlplane.clustertest.hdk:9345\",\"--token\",\"********\",\"--token\",\"********\"]", "rke2.io/node-config-hash":"KBH7366QFJMYGP7XVKW2RW3QQURPI5UW6ZN23LZJFSDEU4A23UTA====", "rke2.io/node-env":"{}", "volumes.kubernetes.io/controller-managed-attach-detach":"true"}
I1203 08:37:56.194124    4188 match.go:74] Searching for interface using 192.168.51.1
I1203 08:37:56.199938    4188 match.go:264] Using interface with name vEthernet (Ethernet) and address 192.168.51.1
I1203 08:37:56.199938    4188 match.go:286] Defaulting external address to interface address (192.168.51.1)
I1203 08:37:56.199938    4188 vxlan_windows.go:126] VXLAN config: Name=flannel.4096 MacPrefix=0E-2A VNI=4096 Port=4789 GBP=false DirectRouting=false
time="2024-12-03T08:37:56Z" level=info msg="HCN feature check" supportedFeatures="{{true true true true} {true true} true true true true true true true true true true true false false false false false false}" version="{13 3}"
I1203 08:37:56.223607    4188 kube.go:490] Creating the node lease for IPv4. This is the n.Spec.PodCIDRs: [10.42.5.0/24]
I1203 08:37:56.225214    4188 device_windows.go:114] Found existing HostComputeNetwork flannel.4096
I1203 08:37:56.225759    4188 device_windows.go:234] Waiting to get net interface for HostComputeNetwork flannel.4096 (192.168.51.1)
I1203 08:37:56.228926    4188 device_windows.go:243] Host interface: vEthernet (Ethernet) bound by flannel.4096 ready
I1203 08:37:56.238354    4188 iptables_windows.go:39] Starting flannel in windows mode...
I1203 08:37:56.238354    4188 kube.go:490] Creating the node lease for IPv4. This is the n.Spec.PodCIDRs: [10.42.5.0/24]
W1203 08:37:56.242962    4188 main.go:540] no subnet found for key: FLANNEL_IPV6_NETWORK in file: /run/flannel/subnet.env
W1203 08:37:56.242962    4188 main.go:540] no subnet found for key: FLANNEL_IPV6_SUBNET in file: /run/flannel/subnet.env
W1203 08:37:56.242962    4188 iptables_windows.go:50] unimplemented
I1203 08:37:56.244600    4188 main.go:396] Wrote subnet file to /run/flannel/subnet.env
I1203 08:37:56.244600    4188 main.go:400] Running backend.
I1203 08:37:56.244600    4188 vxlan_network_windows.go:63] Watching for new subnet leases
I1203 08:37:56.244600    4188 subnet.go:152] Batch elem [0] is { lease.Event{Type:0, Lease:lease.Lease{EnableIPv4:true, EnableIPv6:false, Subnet:ip.IP4Net{IP:0xa2a0000, PrefixLen:0x18}, IPv6Subnet:ip.IP6Net{IP:(*ip.IP6)(nil), PrefixLen:0x0}, Attrs:lease.LeaseAttrs{PublicIP:0xc0a83101, PublicIPv6:(*ip.IP6)(nil), BackendType:"vxlan", BackendData:json.RawMessage{0x7b, 0x22, 0x56, 0x4e, 0x49, 0x22, 0x3a, 0x34, 0x30, 0x39, 0x36, 0x2c, 0x22, 0x56, 0x74, 0x65, 0x70, 0x4d, 0x41, 0x43, 0x22, 0x3a, 0x22, 0x35, 0x32, 0x3a, 0x62, 0x34, 0x3a, 0x66, 0x61, 0x3a, 0x61, 0x61, 0x3a, 0x37, 0x38, 0x3a, 0x32, 0x32, 0x22, 0x7d}, BackendV6Data:json.RawMessage(nil)}, Expiration:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), Asof:0}} }
I1203 08:37:56.244600    4188 subnet.go:152] Batch elem [0] is { lease.Event{Type:0, Lease:lease.Lease{EnableIPv4:true, EnableIPv6:false, Subnet:ip.IP4Net{IP:0xa2a0100, PrefixLen:0x18}, IPv6Subnet:ip.IP6Net{IP:(*ip.IP6)(nil), PrefixLen:0x0}, Attrs:lease.LeaseAttrs{PublicIP:0xc0a83201, PublicIPv6:(*ip.IP6)(nil), BackendType:"vxlan", BackendData:json.RawMessage{0x7b, 0x22, 0x56, 0x4e, 0x49, 0x22, 0x3a, 0x34, 0x30, 0x39, 0x36, 0x2c, 0x22, 0x56, 0x74, 0x65, 0x70, 0x4d, 0x41, 0x43, 0x22, 0x3a, 0x22, 0x35, 0x36, 0x3a, 0x39, 0x36, 0x3a, 0x38, 0x62, 0x3a, 0x65, 0x38, 0x3a, 0x65, 0x63, 0x3a, 0x32, 0x33, 0x22, 0x7d}, BackendV6Data:json.RawMessage(nil)}, Expiration:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), Asof:0}} }
I1203 08:37:56.249294    4188 subnet.go:152] Batch elem [0] is { lease.Event{Type:0, Lease:lease.Lease{EnableIPv4:true, EnableIPv6:false, Subnet:ip.IP4Net{IP:0xa2a0200, PrefixLen:0x18}, IPv6Subnet:ip.IP6Net{IP:(*ip.IP6)(nil), PrefixLen:0x0}, Attrs:lease.LeaseAttrs{PublicIP:0xc0a83202, PublicIPv6:(*ip.IP6)(nil), BackendType:"vxlan", BackendData:json.RawMessage{0x7b, 0x22, 0x56, 0x4e, 0x49, 0x22, 0x3a, 0x34, 0x30, 0x39, 0x36, 0x2c, 0x22, 0x56, 0x74, 0x65, 0x70, 0x4d, 0x41, 0x43, 0x22, 0x3a, 0x22, 0x61, 0x65, 0x3a, 0x37, 0x35, 0x3a, 0x34, 0x39, 0x3a, 0x66, 0x32, 0x3a, 0x66, 0x35, 0x3a, 0x35, 0x33, 0x22, 0x7d}, BackendV6Data:json.RawMessage(nil)}, Expiration:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), Asof:0}} }
I1203 08:37:56.253195    4188 subnet.go:152] Batch elem [0] is { lease.Event{Type:0, Lease:lease.Lease{EnableIPv4:true, EnableIPv6:false, Subnet:ip.IP4Net{IP:0xa2a0300, PrefixLen:0x18}, IPv6Subnet:ip.IP6Net{IP:(*ip.IP6)(nil), PrefixLen:0x0}, Attrs:lease.LeaseAttrs{PublicIP:0xc0a83203, PublicIPv6:(*ip.IP6)(nil), BackendType:"vxlan", BackendData:json.RawMessage{0x7b, 0x22, 0x56, 0x4e, 0x49, 0x22, 0x3a, 0x34, 0x30, 0x39, 0x36, 0x2c, 0x22, 0x56, 0x74, 0x65, 0x70, 0x4d, 0x41, 0x43, 0x22, 0x3a, 0x22, 0x62, 0x65, 0x3a, 0x38, 0x62, 0x3a, 0x32, 0x39, 0x3a, 0x33, 0x37, 0x3a, 0x35, 0x33, 0x3a, 0x32, 0x36, 0x22, 0x7d}, BackendV6Data:json.RawMessage(nil)}, Expiration:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), Asof:0}} }
I1203 08:37:56.254275    4188 main.go:421] Waiting for all goroutines to exit

Deploying a LoadBalancer like MetalLB, exposing the Linux service there and accessing the exposed IP from within the Windows container works. Only DNS gets dropped.

Followup

Might the problem be related to this?

time="2024-12-03T08:37:54Z" level=error msg="Error encountered while importing C:\\var\\lib\\rancher\\rke2\\agent\\images\\runtime-image.txt: failed to pull images from C:\\var\\lib\\rancher\\rke2\\agent\\images\\runtime-image.txt: rpc error: code = Unknown desc = failed to pull and unpack image \"docker.io/rancher/rke2-runtime:v1.30.6-rke2r1-windows-amd64\": failed to extract layer sha256:dbfb9330c9a9cc4ccb1aa1592931f1efe0d7cf794aa6f5fadc25a50944d54736: hcsshim::ProcessBaseLayer \\\\?\\C:\\var\\lib\\rancher\\rke2\\agent\\containerd\\io.containerd.snapshotter.v1.windows\\snapshots\\21: The system cannot find the path specified.: unknown"

I cannot seem to pull any of the rke2-runtime images with ctr i pull either. They all lead to the same problem.

Followup 2

After investigating some more I found out that I can access none of the in-cluster IP addresses from Windows. It works from Linux, I can even curl the service running on Windows.

@manuelbuil
Copy link
Contributor

There is a known issue in Windows Server 2022 since the July patch which breaks communication with Windows servers. If you have a kernel version greater than 10.0.20348.2031 it is going to fail. This is the fix: microsoft/Windows-Containers#516 (comment)

@rabejens
Copy link
Author

rabejens commented Dec 3, 2024

Unfortunately, this fix did not solve my issue.

I still get DNS timeouts. Where should I look into next?

I can expose the service with a LoadBalancer and access its IP address with no problem from the outside, so routing into the Windows container DOES work. The containers just don't seem to see the pod network, but there must be connectivity.

@rabejens
Copy link
Author

rabejens commented Dec 3, 2024

Follow up:

I tore down and re-created the Windows VM, and just after the

Enable-WindowsOptionalFeature -Online -FeatureName containers –All

and the reboot I adjusted the registry settings in the fix. I then proceeded with the installation and after that, it worked.

Thanks!

@rabejens rabejens closed this as completed Dec 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants