You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This PR should allow to
- set a custom log directory for SLURM jobs
- enable auto-deletion of older SLURM log files after 10 days
- allow changing the that behaviour or disabling auto deletion all
together
- auto delete log files of successful SLURM jobs (and ignore this
default behaviour)
The idea behind this PR is to
- limit the number of SLURM log files to keep, as a workflow can easily
produce thousands of log files. In case of a successful job, the
information is redundant anyway, hence the proposed auto-deletion of
successful jobs.
- users may select a custom directory for SLURM log files - this way
different workflows can point to one prefix. Together with the
auto-deletion of older log files, this futher limits the number of
present log files.
It addresses issues #94 and #123.
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
## Release Notes
- **New Features**
- Enhanced SLURM job log management with a configurable log directory.
- Added options to control log file retention and automatic cleanup of
old logs.
- Introduced new command-line flags for specifying log management
settings.
- **Documentation**
- Updated documentation with new command-line flags for log management.
- Improved guidance on managing job logs and their retention policies.
- **Improvements**
- More flexible control over SLURM job log handling.
- Better support for cleaning up empty directories after job completion.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
---------
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Co-authored-by: Johannes Köster <[email protected]>
|`--partition`|`slurm_partition`| the partition a rule/job is to use |
112
-
|`--time`|`runtime`| the walltime per job in minutes |
113
-
|`--constraint`|`constraint`| may hold features on some clusters |
114
-
|`--mem`|`mem`, `mem_mb`| memory a cluster node must |
115
-
||| provide (`mem`: string with unit), `mem_mb`: i |
116
-
|`--mem-per-cpu`|`mem_mb_per_cpu`| memory per reserved CPU |
117
-
|`--ntasks`|`tasks`| number of concurrent tasks / ranks |
118
-
|`--cpus-per-task`|`cpus_per_task`| number of cpus per task (in case of SMP, rather use `threads`) |
119
-
|`--nodes`|`nodes`| number of nodes |
120
-
|`--clusters`|`clusters`| comma separated string of clusters |
121
-
122
-
Each of these can be part of a rule, e.g.:
107
+
Each of the listed command line flags can be part of a rule, e.g.:
123
108
124
109
```python
125
110
rule:
@@ -158,16 +143,6 @@ set-resources:
158
143
cpus_per_task: 40
159
144
```
160
145
161
-
#### Additional Command Line Flags
162
-
163
-
This plugin defines additional command line flags.
164
-
As always, these can be set on the command line or in a profile.
165
-
166
-
| Flag | Meaning |
167
-
|-------------|----------|
168
-
| `--slurm_init_seconds_before_status_checks`| modify time before initial job status check; the default of 40 seconds avoids load on querying slurm databases, but shorter wait times are for example useful during workflow development |
169
-
| `--slurm_requeue` | allows jobs to be resubmitted automatically if they fail or are preempted. See the [section "retries" for details](#retries)|
170
-
171
146
#### Multicluster Support
172
147
173
148
For reasons of scheduling multicluster support is provided by the `clusters` flag in resources sections. Note, that you have to write `clusters`, not `cluster`!
==This is ongoing development. Eventually you will be able to annotate different file access patterns.==
281
256
257
+
### Log Files - Getting Information on Failures
258
+
259
+
Snakemake, via this SLURM executor, submits itself as a job. This ensures that all features are preserved in the job context. SLURM requires a logfile to be written for _every_ job. This is redundant information and only contains the Snakemake output already printed on the terminal. If a rule is equipped with a `log` directive, SLURM logs only contain Snakemake's output.
260
+
261
+
This executor will remove SLURM logs of sucessful jobs immediately when they are finished. You can change this behaviour with the flag `--slurm-keep-successful-logs`. A log file for a failed job will be preserved per default for 10 days. You may change this value using the `--slurm-delete-logfiles-older-than` flag.
262
+
263
+
The default location of Snakemake log files are relative to the directory where the workflow is started or relative to the directory indicated with `--directory`. SLURM logs, produced by Snakemake, can be redirected using `--slurm-logdir`. If you want avoid that log files accumulate in different directories, you can store them in your home directory. Best put the parameter in your profile then, e.g.:
Some cluster jobs may fail. In this case Snakemake can be instructed to try another submit before the entire workflow fails, in this example up to 3 times:
0 commit comments