v1.156
What's Changed
Due to the size of this release, PRs are grouped below. In brief:
- This release addresses various breakages caused by changes to upstream repos. As a result, as of this release the StackHPC images (see below) ship with all dnf repos disabled and either credentials for StackHPC's ark server or a local Pulp server mirrored from
ark
are required in order to build images. - OFED and CUDA are no longer shipped in StacHPC images and require an image build to add.
- StackHPC images move to RockyLinux 9.5 and 8.10.
- Added support for NVIDIA DOCA instead of OFED.
- Added support for Lustre clients.
- OpenHPC role supports using the same nodes in multiple partitions/groups.
- Additional packages can be added via
appliances_default_extra_packages
.
Isolation from upstream dnf repos
- Remove CUDA and OFED builds from CI by @bertiethorpe in #479
- Use rocky 9.4 release train snapshots for builds by @wtripp180901 in #486
- Support site Pulp server for image builds by @wtripp180901 in #490
- Pin nvidia-driver and cuda packages to working packages by @sjpb in #496
- Bump RL9.4 repo timestamps to latest snapshots by @wtripp180901 in #497
- Refactor pulp/dnf roles to avoid having to redefine Ark URLs by @wtripp180901 in #507
- Release train support for Rocky 8.10 by @wtripp180901 in #501
- Bump appliance to Rocky 9.5 + release train support by @wtripp180901 in #503
- Fix python/ansible/pulp squeezer versions for RL8 deploy hosts by @sjpb in #516
- Add Release Train OpenHPC repos by @wtripp180901 in #515
New functionality
- Support lustre client by @sjpb in #447
- Install k3s cluster with ansible init by @wtripp180901 in #441
- Make block device detection work on ESXi by @mkjpryor in #481
- Add role to install NVIDIA DOCA on top of an existing "fat" image by @sjpb in #492
- Fix DOCA install cleanup deleteing /tmp by @sjpb in #494
- Add list of additional package installs by @wtripp180901 in #499
- EXPERIMENTAL: add machinery to allow compute nodes to rejoin cluster on reimage by @sjpb in #500
- Ansible-init compute node script by @bertiethorpe in #476
Docs
- Add missing bits re. initial setup to refactored README by @sjpb in #464
- Add generic upgrade docs by @sjpb in #462
- Add note about login node reboot when changing OOD servername by @sd109 in #510
Fixes
- Remove local DNS as a dependency for k3s by @sjpb in #442
- Fix adhoc/rebuild wait_for_connection race condition by @bertiethorpe in #483
- Fix Lustre deleting rdma packages and bump to v2.15.6 for RL9.5 support by @wtripp180901 in #502
Upgrades
- Upgrade RL8 ceph to quincy + trivy rate limit and OOD false positives fix by @wtripp180901 in #477
- Bump openhpc role for slurm restart, templating and nodes in multiple groups by @sjpb in #488
Internal CI changes/fixes
- Don't run trivy scan on nightly builds by @sjpb in #467
- Unset signature_verified property from nightly/latest images by @sjpb in #474
- Don't fail cluster cleanup when prefix not found by @bertiethorpe in #480
- Fix nightly images getting timestamp/git hash by @sjpb in #493
- Fix nightly build version (v2) by @sjpb in #495
- Remove use of FIPs for leafcloud packer builds by @sjpb in #498
Image Details
Two new images are available (neither of which now contain OFED) :
- RL8: openhpc-RL8-250106-0916-f8603056
- RL9: openhpc-RL9-250106-0916-f8603056
New Contributors
Full Changelog: v1.155...v1.156