Skip to content

v1.156

Compare
Choose a tag to compare
@sjpb sjpb released this 07 Jan 13:20
· 84 commits to main since this release
4def5ba

What's Changed

Due to the size of this release, PRs are grouped below. In brief:

  • This release addresses various breakages caused by changes to upstream repos. As a result, as of this release the StackHPC images (see below) ship with all dnf repos disabled and either credentials for StackHPC's ark server or a local Pulp server mirrored from ark are required in order to build images.
  • OFED and CUDA are no longer shipped in StacHPC images and require an image build to add.
  • StackHPC images move to RockyLinux 9.5 and 8.10.
  • Added support for NVIDIA DOCA instead of OFED.
  • Added support for Lustre clients.
  • OpenHPC role supports using the same nodes in multiple partitions/groups.
  • Additional packages can be added via appliances_default_extra_packages.

Isolation from upstream dnf repos

New functionality

  • Support lustre client by @sjpb in #447
  • Install k3s cluster with ansible init by @wtripp180901 in #441
  • Make block device detection work on ESXi by @mkjpryor in #481
  • Add role to install NVIDIA DOCA on top of an existing "fat" image by @sjpb in #492
  • Fix DOCA install cleanup deleteing /tmp by @sjpb in #494
  • Add list of additional package installs by @wtripp180901 in #499
  • EXPERIMENTAL: add machinery to allow compute nodes to rejoin cluster on reimage by @sjpb in #500
  • Ansible-init compute node script by @bertiethorpe in #476

Docs

  • Add missing bits re. initial setup to refactored README by @sjpb in #464
  • Add generic upgrade docs by @sjpb in #462
  • Add note about login node reboot when changing OOD servername by @sd109 in #510

Fixes

  • Remove local DNS as a dependency for k3s by @sjpb in #442
  • Fix adhoc/rebuild wait_for_connection race condition by @bertiethorpe in #483
  • Fix Lustre deleting rdma packages and bump to v2.15.6 for RL9.5 support by @wtripp180901 in #502

Upgrades

  • Upgrade RL8 ceph to quincy + trivy rate limit and OOD false positives fix by @wtripp180901 in #477
  • Bump openhpc role for slurm restart, templating and nodes in multiple groups by @sjpb in #488

Internal CI changes/fixes

  • Don't run trivy scan on nightly builds by @sjpb in #467
  • Unset signature_verified property from nightly/latest images by @sjpb in #474
  • Don't fail cluster cleanup when prefix not found by @bertiethorpe in #480
  • Fix nightly images getting timestamp/git hash by @sjpb in #493
  • Fix nightly build version (v2) by @sjpb in #495
  • Remove use of FIPs for leafcloud packer builds by @sjpb in #498

Image Details

Two new images are available (neither of which now contain OFED) :

  • RL8: openhpc-RL8-250106-0916-f8603056
  • RL9: openhpc-RL9-250106-0916-f8603056

New Contributors

Full Changelog: v1.155...v1.156