Bump: 24.05.6 release #70

itkovian · 2025-02-26T15:35:22Z

No description provided.

Suspended jobs do not get removed from node usage so, it you cancel it after that, there's a pointer there to a finished job. This causes two issues: 1. Can prevent from running the evaluated job. 2. If the deleted job is purged, any attempts to read its contents will lead to bad data and potential crash. In the related ticket, _is_job_sharing was segfaulting. Changelog: Fix crash and issues evaluating job's suitability for running in nodes with already suspended job(s) there. Ticket: 21767 Cherry-picked: 19d9185

Cherry-pick !428 into slurm-24.05 See merge request SchedMD/dev/slurm!506

Cherry-picked: 5698114

Cherry-pick !516 into slurm-24.05 See merge request SchedMD/dev/slurm!518

When a job taking 2 or more nodes had all of its nodes fail, and no EpilogSlurmctld was configured, job requeuing was not correctly processed as batch_requeue_fini was not called. This resulted in the following issues: - Requeued job was not assigned a new SLUID. - Job steps of new jobs were not being reset to 0. This left incorrect entries in the accounting database for the requeued job. Added a batch_requeue_fini call to fix that. Ticket: 20177 Changelog: Fixed a job requeuing issue that merged job entries into the same SLUID when all nodes in a job failed simultaneously. Cherry-picked: d7c0dfc

Cherry-pick !322 into slurm-24.05 See merge request SchedMD/dev/slurm!541

Newer cxi drivers changed the kernel module to "cxi_ss1". To maintain support for new and old drivers, first attempt the new location then attempt the old one when checking rdzv_get_en_default. Changelog: switch/hpe_slingshot - Fix compatibility with newer cxi drivers, specifically when specifying disable_rdzv_get. Ticket: 22087 Cherry-picked: e8ed3df

Cherry-pick !579 into slurm-24.05 See merge request SchedMD/dev/slurm!582

Trigger abort() rather than exit() for any fatal() message. Changelog: Add ABORT_ON_FATAL environment variable to capture a backtrace from any fatal() message. Issue: 50181 Ticket: 21582 Cherry-picked: 5666caa

Cherry-pick !575 into slurm-24.05 See merge request SchedMD/dev/slurm!586

Ticket: 22162 Cherry-picked: feef273

Cherry-pick !615 into slurm-24.05 See merge request SchedMD/dev/slurm!629

Update slurm.spec and debian/changelog as well.

tripiana and others added 16 commits January 29, 2025 20:30

Merge branch 'cherrypick-428-24.05' into 'slurm-24.05'

462b9c9

Cherry-pick !428 into slurm-24.05 See merge request SchedMD/dev/slurm!506

Testsuite - Improve test22.1 avoiding false failures in busy nodes

1828cfa

Cherry-picked: 5698114

Merge branch 'cherrypick-516-24.05' into 'slurm-24.05'

ffd8822

Cherry-pick !516 into slurm-24.05 See merge request SchedMD/dev/slurm!518

Merge branch 't21839_2405' into 'slurm-24.05'

a254f7a

Cherry-pick !322 into slurm-24.05 See merge request SchedMD/dev/slurm!541

Merge branch 'cherrypick-579-24.05' into 'slurm-24.05'

62aa671

Cherry-pick !579 into slurm-24.05 See merge request SchedMD/dev/slurm!582

Add ABORT_ON_FATAL environment variable

9ad45a9

Trigger abort() rather than exit() for any fatal() message. Changelog: Add ABORT_ON_FATAL environment variable to capture a backtrace from any fatal() message. Issue: 50181 Ticket: 21582 Cherry-picked: 5666caa

Merge branch 'cherrypick-575-24.05' into 'slurm-24.05'

ca1994f

Cherry-pick !575 into slurm-24.05 See merge request SchedMD/dev/slurm!586

Testsuite - Improve start_slurmrestd() to avoid infinite loop

09a56e5

Ticket: 22162 Cherry-picked: feef273

Merge branch 'cherrypick-615-24.05' into 'slurm-24.05'

f7f3115

Cherry-pick !615 into slurm-24.05 See merge request SchedMD/dev/slurm!629

Docs - Update REST API reference

bd73041

Populate NEWS for 24.05.6

3565946

Update META for 24.05.6.

b0229f1

Update slurm.spec and debian/changelog as well.

Merge branch 'slurm-24.05' into 24.05.6.ug

b211a50

wdpypere merged commit 317c71a into hpcugent:24.05.ug Feb 26, 2025
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bump: 24.05.6 release #70

Bump: 24.05.6 release #70

itkovian commented Feb 26, 2025

Bump: 24.05.6 release #70

Bump: 24.05.6 release #70

Conversation

itkovian commented Feb 26, 2025