Convert K8s-prod nodes k8s-node-7 and k8s-node-8 from VMs to bare-metal #47

nickatnceas · 2024-08-15T16:26:04Z

K8s-prod nodes k8s-node-7 and k8s-node-8 are currently on physical hosts host-ucsb-24 and host-ucsb-25. Deleting the node VMs and redeploying the node directly on the host will allow us to use memory that was previously reserved for the host, and provide a small performance boost (~5% ?) due to it no longer being virtualized.

Since these nodes do not benefit from Live Migration, ie they can be drained at any time without major interruptions in services, and because the physical hosts will not be sharing resources with any other VMs, the there are no benefits of having VMs in this case.

Dev nodes will move from hosts 24 and 25 to hosts 9 and 10, and move from 16 to 32 vCPUs

Current:

host-ucsb-24
- k8s-node-7 VM
  - 256 vCPUs
  - 352 GB memory
- k8s-dev-node-4 VM
  - 16 vCPUs
  - 128 GB memory
host-ucsb-25
- k8s-node-8 VM
  - 256 vCPUs
  - 352 GB memory
- k8s-dev-node-5 VM
  - 16 vCPUs
  - 128 GB memory

Planned:

host-ucsb-24
- 256 vCPUs
- 512 GB memory
host-ucsb-25
- 256 vCPUs
- 512 GB memory
host-ucsb-9
- k8s-dev-node-4 VM
  - 32 vCPUs
  - 128 GB memory
host-ucsb-10
- k8s-dev-node-5 VM
  - 32 vCPUs
  - 128 GB memory

The text was updated successfully, but these errors were encountered:

nickatnceas · 2024-10-17T23:33:05Z

I attempted deploying the k8s software onto host-ucsb-24 to run a bare-metal node, but hit some issues:

K8s 1.23.4 packages are no longer being distributed by the K8s project
The oldest packages which are being distributed, 1.24.17, do not work with our cluster. The software successfully connects to the controller nodes, but never successfully starts up any containers

Instead of troubleshooting this old version, I'm going to move back to using the VMs for now. Once we have successfully upgraded K8s #35 we can try again.

nickatnceas · 2024-10-18T00:08:04Z

Quick view of the issue:

outin@bluey:~/.kube$ kubectl get pods -A -o wide | grep host-ucsb-24
ceph-csi-cephfs   ceph-csi-cephfs-csi-cephfsplugin-jc89x         0/3     CrashLoopBackOff   18 (4m39s ago)    16m      128.111.85.154    host-ucsb-24    <none>           <none>
ceph-csi-rbd      ceph-csi-rbd-csi-cephrbdplugin-q9wn8           3/3     Running            18 (3m32s ago)    16m      128.111.85.154    host-ucsb-24    <none>           <none>
kube-system       calico-node-hdpx4                              0/1     CrashLoopBackOff   6 (112s ago)      17m      128.111.85.154    host-ucsb-24    <none>           <none>
kube-system       kube-proxy-mqdd8                               0/1     CrashLoopBackOff   5 (113s ago)      17m      128.111.85.154    host-ucsb-24    <none>           <none>
velero            node-agent-dwwp2                               0/1     CrashLoopBackOff   6 (2m26s ago)     16m      192.168.99.136    host-ucsb-24    <none>           <none>

Here is the k8s-node-7 VM after about the same amount of startup time:

outin@bluey:~/.kube$ kubectl get pods -A -o wide | grep k8s-node-7
ceph-csi-cephfs   ceph-csi-cephfs-csi-cephfsplugin-c78rc         3/3     Running      3                 16m      128.111.85.146    k8s-node-7      <none>           <none>
ceph-csi-rbd      ceph-csi-rbd-csi-cephrbdplugin-jr8c2           3/3     Running      3                 16m      128.111.85.146    k8s-node-7      <none>           <none>
kube-system       calico-node-pbchl                              1/1     Running      1                 16m      128.111.85.146    k8s-node-7      <none>           <none>
kube-system       kube-proxy-6kbc5                               1/1     Running      1                 16m      128.111.85.146    k8s-node-7      <none>           <none>
velero            node-agent-wqwn6                               1/1     Running      3 (11m ago)       16m      192.168.197.192   k8s-node-7      <none>           <none>

mbjones · 2024-10-18T00:17:52Z

For the pods in CrashLoopBackOff, you should get some helpful troubleshooting info by describing the pod status (e.g., kubectl describe -n kube-system pod kube-proxy-mqdd8).

nickatnceas · 2024-10-18T17:34:05Z

For the pods in CrashLoopBackOff, you should get some helpful troubleshooting info by describing the pod status (e.g., kubectl describe -n kube-system pod kube-proxy-mqdd8).

I don't feel like this is worth troubleshooting for a few reasons, but mainly because these two versions are so old (1.23 and 1.24), and the time would be better spent upgrading to the latest version and troubleshooting those issues (#35), then fixing any issue that arrive from this migration.

mbjones · 2024-10-18T17:38:00Z

Yep, totally agree on the version/upgrade stuff. Sorry for the diversion.

nickatnceas added the enhancement New feature or request label Aug 15, 2024

nickatnceas self-assigned this Aug 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Convert K8s-prod nodes k8s-node-7 and k8s-node-8 from VMs to bare-metal #47

Convert K8s-prod nodes k8s-node-7 and k8s-node-8 from VMs to bare-metal #47

nickatnceas commented Aug 15, 2024 •

edited

Loading

nickatnceas commented Oct 17, 2024

nickatnceas commented Oct 18, 2024

mbjones commented Oct 18, 2024

nickatnceas commented Oct 18, 2024

mbjones commented Oct 18, 2024

Convert K8s-prod nodes k8s-node-7 and k8s-node-8 from VMs to bare-metal #47

Convert K8s-prod nodes k8s-node-7 and k8s-node-8 from VMs to bare-metal #47

Comments

nickatnceas commented Aug 15, 2024 • edited Loading

nickatnceas commented Oct 17, 2024

nickatnceas commented Oct 18, 2024

mbjones commented Oct 18, 2024

nickatnceas commented Oct 18, 2024

mbjones commented Oct 18, 2024

nickatnceas commented Aug 15, 2024 •

edited

Loading