Skip to content

VM NetBoot / NetworkBoot fails on some nodes: inf7 inf8 or inf44 #208

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
rbo opened this issue Oct 16, 2024 · 6 comments
Closed

VM NetBoot / NetworkBoot fails on some nodes: inf7 inf8 or inf44 #208

rbo opened this issue Oct 16, 2024 · 6 comments
Assignees
Labels
bug Something isn't working cluster/isar BareMetal COE Cluter

Comments

@rbo
Copy link
Member

rbo commented Oct 16, 2024

on inf44:
image

on ucs56

image

@rbo rbo added bug Something isn't working cluster/isar BareMetal COE Cluter labels Oct 16, 2024
@rbo rbo self-assigned this Oct 16, 2024
@rbo
Copy link
Member Author

rbo commented Oct 16, 2024

/cc @DanielFroehlich

Moving the VM (MAC: 0e:c0:ef:20:63:10 in ucs57, and watch the network traffic:

$ oc debug node/ucs57
sh-5.1# tcpdump -i coe-bridge -n port 67 and port 68
dropped privs to tcpdump
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on coe-bridge, link-type EN10MB (Ethernet), snapshot length 262144 bytes
15:01:53.170785 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from f8:f2:1e:db:6c:f0, length 281
15:01:54.699247 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from 6c:fe:54:4b:12:59, length 281
15:01:54.993995 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from 0e:c0:ef:20:63:10, length 347
15:01:54.995331 IP 10.32.96.1.bootps > 255.255.255.255.bootpc: BOOTP/DHCP, Reply, length 326
15:01:58.063646 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from 0e:c0:ef:20:63:10, length 359
15:01:58.064036 IP 10.32.96.1.bootps > 255.255.255.255.bootpc: BOOTP/DHCP, Reply, length 326
15:01:58.064165 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from 0e:c0:ef:20:63:10, length 291
15:02:03.104924 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from 0e:c0:ef:20:63:10, length 340
15:02:03.105368 IP 10.32.96.1.bootps > 255.255.255.255.bootpc: BOOTP/DHCP, Reply, length 338
15:02:03.814602 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from e8:eb:d3:08:d1:49, length 281
15:02:06.408511 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from 00:25:b5:00:00:06, length 296
15:02:07.084158 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from 0e:c0:ef:20:63:10, length 352
15:02:07.084529 IP 10.32.96.1.bootps > 255.255.255.255.bootpc: BOOTP/DHCP, Reply, length 338
^C
13 packets captured
13 packets received by filter
0 packets dropped by kernel
sh-5.1# exit
exit

Running VM on inf44:

oc debug node/inf44
Starting pod/inf44-debug-q7pz8 ...
To use host binaries, run `chroot /host`
Pod IP: 10.32.96.44
If you don't see a command prompt, try pressing enter.
sh-5.1#
sh-5.1#
sh-5.1#
sh-5.1# tcpdump -i coe-bridge -n port 67 and port 68
dropped privs to tcpdump
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on coe-bridge, link-type EN10MB (Ethernet), snapshot length 262144 bytes
15:04:55.635980 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from a0:36:9f:06:8e:2e, length 284
15:04:56.811352 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from a0:36:9f:07:ff:ff, length 284
15:05:07.476090 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from f8:f2:1e:db:6c:f0, length 281
15:05:08.805167 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from 6c:fe:54:4b:12:59, length 281
15:05:17.792533 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from e8:eb:d3:08:d1:49, length 281
15:05:20.355751 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from 00:25:b5:00:00:06, length 296


15:05:35.620308 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from e8:eb:d3:08:d1:48, length 281


15:06:00.494174 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from a0:36:9f:06:8e:2e, length 284
15:06:01.613988 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from a0:36:9f:07:ff:ff, length 284
15:06:11.971938 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from f8:f2:1e:db:6c:f0, length 281
15:06:13.419260 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from 6c:fe:54:4b:12:59, length 281
15:06:22.662867 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from e8:eb:d3:08:d1:49, length 281
15:06:25.328194 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from 00:25:b5:00:00:06, length 296
15:06:39.813851 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from e8:eb:d3:08:d1:48, length 281
15:06:48.913960 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from a0:36:9f:07:ff:ff, length 284
15:06:51.244706 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from a0:36:9f:07:ff:ff, length 284
15:06:56.029003 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from a0:36:9f:07:ff:ff, length 284
15:07:04.687854 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from a0:36:9f:07:ff:ff, length 284
15:07:05.036995 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from a0:36:9f:06:8e:2e, length 284
15:07:05.839964 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from a0:36:9f:07:ff:ff, length 284
^C
20 packets captured
20 packets received by filter
0 packets dropped by kernel
sh-5.1#

There are not DHCP requests visible

@rbo
Copy link
Member Author

rbo commented Oct 16, 2024

comparing virt pods:

diff -Nuar /tmp/inf44.yaml /tmp/ucs57.yaml
--- /tmp/inf44.yaml	2024-10-16 17:08:10.395589448 +0200
+++ /tmp/ucs57.yaml	2024-10-16 17:09:50.658052559 +0200
@@ -2,15 +2,15 @@
 kind: Pod
 metadata:
   annotations:
-    k8s.ovn.org/pod-networks: '{"default":{"ip_addresses":["10.128.8.214/21"],"mac_address":"0a:58:0a:80:08:d6","gateway_ips":["10.128.8.1"],"routes":[{"dest":"10.128.0.0/14","nextHop":"10.128.8.1"},{"dest":"172.30.0.0/16","nextHop":"10.128.8.1"},{"dest":"100.64.0.0/16","nextHop":"10.128.8.1"}],"ip_address":"10.128.8.214/21","gateway_ip":"10.128.8.1"}}'
+    k8s.ovn.org/pod-networks: '{"default":{"ip_addresses":["10.130.11.54/21"],"mac_address":"0a:58:0a:82:0b:36","gateway_ips":["10.130.8.1"],"routes":[{"dest":"10.128.0.0/14","nextHop":"10.130.8.1"},{"dest":"172.30.0.0/16","nextHop":"10.130.8.1"},{"dest":"100.64.0.0/16","nextHop":"10.130.8.1"}],"ip_address":"10.130.11.54/21","gateway_ip":"10.130.8.1"}}'
     k8s.v1.cni.cncf.io/network-status: |-
       [{
           "name": "ovn-kubernetes",
           "interface": "eth0",
           "ips": [
-              "10.128.8.214"
+              "10.130.11.54"
           ],
-          "mac": "0a:58:0a:80:08:d6",
+          "mac": "0a:58:0a:82:0b:36",
           "default": true,
           "dns": {}
       },{
@@ -23,7 +23,7 @@
     kubectl.kubernetes.io/default-container: compute
     kubevirt.io/domain: ushift16-ostree
     kubevirt.io/migrationTransportUnix: "true"
-    kubevirt.io/vm-generation: "16"
+    kubevirt.io/vm-generation: "19"
     openshift.io/scc: kubevirt-controller
     post.hook.backup.velero.io/command: '["/usr/bin/virt-freezer", "--unfreeze", "--name",
       "ushift16-ostree", "--namespace", "rbohne-debug"]'
@@ -32,15 +32,15 @@
       "ushift16-ostree", "--namespace", "rbohne-debug"]'
     pre.hook.backup.velero.io/container: compute
     seccomp.security.alpha.kubernetes.io/pod: localhost/kubevirt/kubevirt.json
-  creationTimestamp: "2024-10-16T15:05:29Z"
+  creationTimestamp: "2024-10-16T15:09:03Z"
   generateName: virt-launcher-ushift16-ostree-
   labels:
     kubevirt.io: virt-launcher
-    kubevirt.io/created-by: 0fb45324-f59e-4a89-9335-60e9d45ef927
-    kubevirt.io/nodeName: inf44
+    kubevirt.io/created-by: 33951148-4769-4319-bef3-3ccb3e472032
+    kubevirt.io/nodeName: ucs57
     vm.kubevirt.io/name: ushift16-ostree
     vm_group: cluster_ushift16_ostree
-  name: virt-launcher-ushift16-ostree-wd9nv
+  name: virt-launcher-ushift16-ostree-wsgb7
   namespace: rbohne-debug
   ownerReferences:
   - apiVersion: kubevirt.io/v1
@@ -48,9 +48,9 @@
     controller: true
     kind: VirtualMachineInstance
     name: ushift16-ostree
-    uid: 0fb45324-f59e-4a89-9335-60e9d45ef927
-  resourceVersion: "1638035362"
-  uid: 54bc701b-f3aa-43c4-af8d-e917a8ed47f2
+    uid: 33951148-4769-4319-bef3-3ccb3e472032
+  resourceVersion: "1638049339"
+  uid: be3ea479-6a84-4e29-82c6-178810925c56
 spec:
   affinity:
     nodeAffinity:
@@ -64,11 +64,11 @@
   - command:
     - /usr/bin/virt-launcher-monitor
     - --qemu-timeout
-    - 332s
+    - 266s
     - --name
     - ushift16-ostree
     - --uid
-    - 0fb45324-f59e-4a89-9335-60e9d45ef927
+    - 33951148-4769-4319-bef3-3ccb3e472032
     - --namespace
     - rbohne-debug
     - --kubevirt-share-dir
@@ -166,10 +166,10 @@
   hostname: ushift16-ostree
   imagePullSecrets:
   - name: default-dockercfg-r96q8
-  nodeName: inf44
+  nodeName: ucs57
   nodeSelector:
     kubernetes.io/arch: amd64
-    kubernetes.io/hostname: inf44
+    kubernetes.io/hostname: ucs57
     kubevirt.io/schedulable: "true"
   preemptionPolicy: PreemptLowerPriority
   priority: 0
@@ -229,34 +229,34 @@
     name: hotplug-disks
 status:
   conditions:
-  - lastProbeTime: "2024-10-16T15:05:29Z"
-    lastTransitionTime: "2024-10-16T15:05:29Z"
+  - lastProbeTime: "2024-10-16T15:09:03Z"
+    lastTransitionTime: "2024-10-16T15:09:03Z"
     message: the virtual machine is not paused
     reason: NotPaused
     status: "True"
     type: kubevirt.io/virtual-machine-unpaused
   - lastProbeTime: null
-    lastTransitionTime: "2024-10-16T15:05:37Z"
+    lastTransitionTime: "2024-10-16T15:09:10Z"
     status: "True"
     type: PodReadyToStartContainers
   - lastProbeTime: null
-    lastTransitionTime: "2024-10-16T15:05:29Z"
+    lastTransitionTime: "2024-10-16T15:09:03Z"
     status: "True"
     type: Initialized
   - lastProbeTime: null
-    lastTransitionTime: "2024-10-16T15:05:37Z"
+    lastTransitionTime: "2024-10-16T15:09:10Z"
     status: "True"
     type: Ready
   - lastProbeTime: null
-    lastTransitionTime: "2024-10-16T15:05:37Z"
+    lastTransitionTime: "2024-10-16T15:09:10Z"
     status: "True"
     type: ContainersReady
   - lastProbeTime: null
-    lastTransitionTime: "2024-10-16T15:05:29Z"
+    lastTransitionTime: "2024-10-16T15:09:03Z"
     status: "True"
     type: PodScheduled
   containerStatuses:
-  - containerID: cri-o://7c0f2e9be660bd79a066b3a1285ebb8a701d3abb57cab5150c954a5579390606
+  - containerID: cri-o://303e88cb684c686c47ae110003b3058c2cf054c13cb063119a6b14a2b052d939
     image: registry.redhat.io/container-native-virtualization/virt-launcher-rhel9@sha256:444191284ff0adb7e38d4786a037a0c39a340cfea6b3a943951c8a3dc79dacb2
     imageID: registry.redhat.io/container-native-virtualization/virt-launcher-rhel9@sha256:2961c32db99ee3af67c299417207b4f714d3dd007f3b02e1443d36839b375bec
     lastState: {}
@@ -266,13 +266,13 @@
     started: true
     state:
       running:
-        startedAt: "2024-10-16T15:05:36Z"
-  hostIP: 10.32.96.44
+        startedAt: "2024-10-16T15:09:09Z"
+  hostIP: 10.32.96.57
   hostIPs:
-  - ip: 10.32.96.44
+  - ip: 10.32.96.57
   phase: Running
-  podIP: 10.128.8.214
+  podIP: 10.130.11.54
   podIPs:
-  - ip: 10.128.8.214
+  - ip: 10.130.11.54
   qosClass: Burstable
-  startTime: "2024-10-16T15:05:29Z"
+  startTime: "2024-10-16T15:09:03Z"

@DanielFroehlich
Copy link

@rbo right - that fits my obersvation that there are no log entries on the DHCP server either. The fact that there is no PXE/HTTP boot options in the BIOS visible lets me think that the VM feels like it has no NIC, or the NIC is is not connected to a network.

@rbo
Copy link
Member Author

rbo commented Oct 16, 2024

Nic is available I checked in the bios settings: (running on inf44)

Screenshot 2024-10-16 at 17 07 00

I was not able to jump into the bios settings when it's running on ucs57

@rbo rbo changed the title VM NetworkBook fails on some nodes: inf7 inf8 or inf44 VM NetBoot / NetworkBoot fails on some nodes: inf7 inf8 or inf44 May 8, 2025
@rbo
Copy link
Member Author

rbo commented May 8, 2025

DHCP Config

host netboot {
  hardware ethernet 0E:C0:EF:20:62:06;
  fixed-address 10.32.98.6;
  option host-name "netboot";
  option domain-name "coe.muc.redhat.com";
  ddns-hostname "netboot.coe.muc.redhat.com";
  filename "http://ushift-imgbld.stormshift.coe.muc.redhat.com/ushift-bootc-install-iso/EFI/BOOT/BOOTX64.EFI";
}

VM

apiVersion: kubevirt.io/v1
kind: VirtualMachine
metadata:
  name: netboot
spec:
  dataVolumeTemplates:
  - metadata:
      creationTimestamp: null
      name: netboot-root
    spec:
      source:
        blank: {}
      storage:
        accessModes:
        - ReadWriteMany
        resources:
          requests:
            storage: 5Gi
        storageClassName: ocs-storagecluster-ceph-rbd-virtualization
        volumeMode: Block
  running: true
  template:
    metadata:
      creationTimestamp: null
    spec:
      architecture: amd64
      domain:
        clock:
          timezone: Etc/GMT
        cpu:
          cores: 3
          sockets: 1
          threads: 1
        devices:
          disks:
          - disk:
              bus: virtio
            name: root-disk
            bootOrder: 2
          interfaces:
          - bridge: {}
            bootOrder: 1
            macAddress: 0E:C0:EF:20:62:06
            model: virtio
            name: net-0
        machine:
          type: q35
        memory:
          guest: 16Gi
        resources:
          requests:
            cpu: 1500m
            memory: 16Gi
      networks:
      - multus:
          networkName: coe-bridge
        name: net-0
      nodeSelector:
        kubernetes.io/hostname: inf44
      volumes:
      - dataVolume:
          name: netboot-root
        name: root-disk

@rbo
Copy link
Member Author

rbo commented May 8, 2025

  • NetBoot on inf44 : ✅ WORKS

@rbo rbo closed this as completed May 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working cluster/isar BareMetal COE Cluter
Projects
None yet
Development

No branches or pull requests

2 participants