forked from SchedMD/slurm
-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bump: 24.05.6 release #70
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Suspended jobs do not get removed from node usage so, it you cancel it after that, there's a pointer there to a finished job. This causes two issues: 1. Can prevent from running the evaluated job. 2. If the deleted job is purged, any attempts to read its contents will lead to bad data and potential crash. In the related ticket, _is_job_sharing was segfaulting. Changelog: Fix crash and issues evaluating job's suitability for running in nodes with already suspended job(s) there. Ticket: 21767 Cherry-picked: 19d9185
Cherry-pick !428 into slurm-24.05 See merge request SchedMD/dev/slurm!506
Cherry-pick !516 into slurm-24.05 See merge request SchedMD/dev/slurm!518
When a job taking 2 or more nodes had all of its nodes fail, and no EpilogSlurmctld was configured, job requeuing was not correctly processed as batch_requeue_fini was not called. This resulted in the following issues: - Requeued job was not assigned a new SLUID. - Job steps of new jobs were not being reset to 0. This left incorrect entries in the accounting database for the requeued job. Added a batch_requeue_fini call to fix that. Ticket: 20177 Changelog: Fixed a job requeuing issue that merged job entries into the same SLUID when all nodes in a job failed simultaneously. Cherry-picked: d7c0dfc
Cherry-pick !322 into slurm-24.05 See merge request SchedMD/dev/slurm!541
Newer cxi drivers changed the kernel module to "cxi_ss1". To maintain support for new and old drivers, first attempt the new location then attempt the old one when checking rdzv_get_en_default. Changelog: switch/hpe_slingshot - Fix compatibility with newer cxi drivers, specifically when specifying disable_rdzv_get. Ticket: 22087 Cherry-picked: e8ed3df
Cherry-pick !579 into slurm-24.05 See merge request SchedMD/dev/slurm!582
Trigger abort() rather than exit() for any fatal() message. Changelog: Add ABORT_ON_FATAL environment variable to capture a backtrace from any fatal() message. Issue: 50181 Ticket: 21582 Cherry-picked: 5666caa
Cherry-pick !575 into slurm-24.05 See merge request SchedMD/dev/slurm!586
Ticket: 22162 Cherry-picked: feef273
Cherry-pick !615 into slurm-24.05 See merge request SchedMD/dev/slurm!629
Update slurm.spec and debian/changelog as well.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.