Welcome to the OpenCHAMI hands-on tutorial! This guide walks you through building a complete PXE-boot & cloud-init environment for HPC compute nodes using libvirt/KVM.
The cloud-based instance provided for this class is detailed in AWS_Environment.md. Your instance must meet these requirements before you begin:
- OS & Kernel:
- RHEL/CentOS/Rocky 9+ or equivalent
- Linux kernel ≥ 5.10 with cgroups v2 enabled
- Packages (minimum versions):
- QEMU 6.x,
virt-install
≥ 4.x - Podman 4.x
- QEMU 6.x,
- Networking:
- Bridge device (e.g.
br0
)
- Bridge device (e.g.
- Storage:
- NFS (or equivalent) export for
/var/lib/ochami/images
- MinIO (or S3) with credentials ready
- OCI Container registry with credentials ready
- NFS (or equivalent) export for
- Tools:
tcpdump
,tftp
,virsh
,curl
A quick snapshot of the data flows:
- Discovery: Head node learns about virtual nodes via
ochami discover
. - Image Build: Containerized image layers → squashfs → organized with registry and served via S3.
- Provisioning: PXE boot → TFTP pulls kernel/initrd → installer.
- Config & Join: cloud-init applies user-data, finalizes OS.
Each “Phase” is a self-contained lab with a checkpoint exercise.
- Instance Preparation
- Host packages, kernel modules, cgroups, bridge setup, nfs setup
- Deploy MinIO, nginx, and registry
- Checkpoints:
systemctl status minio
systemctl status registry
- OpenCHAMI & Core Services
- Install OpenCHAMI RPMs
- Deploy internal Certificate Authority and import signing certificate
- Checkpoints:
ochami bss status
systemctl list-dependencies openchami.target
- Static Discovery & SMD Population
- Anatomy of
nodes.yaml
,ochami discover
- Checkpoint:
ochami smd component get | jq '.Components[] | select(.Type == "Node")'
- Anatomy of
- Image Builder
- Define base, compute, debug container layers
- Build & push to registry/S3
- Checkpoints:
s3cmd ls -Hr s3://boot-images/
regctl tag ls demo.openchami.cluster:5000/demo/rocky-base
- PXE Boot Configuration
boot.yaml
, BSS parameters, virt-install examples- Verify DHCP options & TFTP with
tcpdump
,tftp
- Checkpoint: Successful serial console installer
- Cloud-Init Configuration
- Merging
cloud-init.yaml
, host-group overrides - Customizing users, networking, mounts
- Checkpoint: Inspect
/var/log/cloud-init.log
on node
- Merging
- Virtual Compute Nodes & Demo
virsh console
, node reboot workflows, cleanup scripts- Scaling to multiple nodes with a looped script
- Checkpoint: Run a sample MPI job across two VMs
- PXE ROM silent on serial
- BIOS stage → VGA only; use
--extra-args 'console=ttyS0,115200n8 inst.text'
- BIOS stage → VGA only; use
- No DHCP OFFER
- Verify via
sudo tcpdump -i br0 port 67 or 68
- Verify via
- Service fails to start
- Inspect
journalctl -u <service name>
, check port conflicts
- Inspect
- Certficate Issues
- Ensure the system cert contains our root cert
grep CHAMI /etc/pki/ca-trust/extracted/pem/tls-ca-bundle.pem
- Ensure the system cert contains our root cert
- Token Issues
- Tokens are only valid for an hour. Renew with
export DEMO_ACCESS_TOKEN=$(sudo bash -lc 'gen_access_token')
in each terminal windown
- Tokens are only valid for an hour. Renew with
- Insecure default credentials (MinIO, CoreDHCP admin).
- Use TLS for API endpoints and registry.
- Isolate VLANs for provisioning traffic.
- Harden cloud-init scripts: avoid embedding secrets in plaintext.
- OpenCHAMI Docs: https://openchami.org
- cloud-init Reference: https://cloudinit.readthedocs.io
- PXE/TFTP How-To: https://wiki.archlinux.org/title/PXE
- Give Feedback: [Issue Tracker or Feedback Form Link]
© 2025 OpenCHAMI Project · Licensed under Apache 2.0
LA-UR-25-25073