You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is a basic question but I would be glad to hear your thoughts on it: what is the best practice to design a short-running rule that will be used to spawn many jobs (using snakemake in a SLURM context, of course). I would define "short-running" as inferior to 3min, and "many jobs" as thousands of calls.
Without Snakemake, I would have used SLURM job arrays and a wrapper script to get batches of ~1h running jobs. My assumption is that it is best to give SLURM big-enough chunks so that we do not stress it too much with many jobs (and remain below the max number of jobs limit), but also small-enough chunks so that the scheduler is more likely to give us resources (and allocate them fairly among users).
With Snakemake and the slurm plugin, I would like to avoid writing a wrapper script, so:
I may write rules to split/gather the batches much like the built-in scatter-gather feature. This works, but it's kind of like writing a wrapping script.
I may use group and group-components like in Bundle many small jobs into one larger job submission snakemake#872. This also works, but I find it kind of cumbersome to parametrize resources (ex: if I want to design ~1h batches out of a rule that typically takes ~2min, I must first set cores to this rule cpus_per_task to make sure calls will be in series, then group-components to 30 (=60/2); but as cores is set for all groups, it gets more complex if I must design "batches" for several rules).
Are my assumptions correct? What do you usually do to deal with short-running rules called many times?
Thanks!
The text was updated successfully, but these errors were encountered:
Hi all,
This is a basic question but I would be glad to hear your thoughts on it: what is the best practice to design a short-running rule that will be used to spawn many jobs (using snakemake in a SLURM context, of course). I would define "short-running" as inferior to 3min, and "many jobs" as thousands of calls.
Without Snakemake, I would have used SLURM job arrays and a wrapper script to get batches of ~1h running jobs. My assumption is that it is best to give SLURM big-enough chunks so that we do not stress it too much with many jobs (and remain below the max number of jobs limit), but also small-enough chunks so that the scheduler is more likely to give us resources (and allocate them fairly among users).
With Snakemake and the slurm plugin, I would like to avoid writing a wrapper script, so:
cores
to this rule cpus_per_task to make sure calls will be in series, then group-components to 30 (=60/2); but ascores
is set for all groups, it gets more complex if I must design "batches" for several rules).Are my assumptions correct? What do you usually do to deal with short-running rules called many times?
Thanks!
The text was updated successfully, but these errors were encountered: