Skip to content

SLURM sbatch failed causes Snakemake crash #320

Open
@nikostr

Description

@nikostr

Software Versions

snakemake: 8.27.1
snakemake-executor-plugin: 1.3.6
SLURM: 24.05.4

Describe the bug
sbatch: error: Batch job submission failed: Requested node configuration is not available leads to Snakemake crashing with WorkflowError, with jobs being left running/in the SLURM queue.

A minimal improvement would be for the workflow to shut down nicely, either canceling jobs and cleaning up after them, or waiting for them to finish. Another option would be to treat this like other job failures, allowing to workflow to keep going, and even allowing retries for jobs failing due to this. Also, specifying which exact call to SLURM caused the error would help with troubleshooting these issues.

Logs

The full error message:

WorkflowError:
SLURM sbatch failed. The error message was sbatch: error: Batch job submission failed: Requested node configuration is not available

Additional context
My cluster's resource limits aren't super clear. Some combination of resources generated in later attempts are clearly not okay.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions