Description
Software Versions
snakemake: 8.27.1
snakemake-executor-plugin: 1.3.6
SLURM: 24.05.4
Describe the bug
sbatch: error: Batch job submission failed: Requested node configuration is not available
leads to Snakemake crashing with WorkflowError
, with jobs being left running/in the SLURM queue.
A minimal improvement would be for the workflow to shut down nicely, either canceling jobs and cleaning up after them, or waiting for them to finish. Another option would be to treat this like other job failures, allowing to workflow to keep going, and even allowing retries for jobs failing due to this. Also, specifying which exact call to SLURM caused the error would help with troubleshooting these issues.
Logs
The full error message:
WorkflowError:
SLURM sbatch failed. The error message was sbatch: error: Batch job submission failed: Requested node configuration is not available
Additional context
My cluster's resource limits aren't super clear. Some combination of resources generated in later attempts are clearly not okay.