Skip to content

Commit a8d131b

Browse files
feat!: format shell directives using shfmt (#295)
* feat!: format shell directives using shfmt Add shfmt-py as dependency to format Snakemake shell directives. Mask variables like {input} to prevent shfmt mangling. Ignore escaped braces like {{ }} during masking. Extract and format multiline string literals safely. Add --no-format-shell CLI flag to opt-out. Shell_formatter internals are split into focused helpers (_mask_snakemake_vars, _invoke_shfmt, _unmask_snakemake_vars) to give future backend swaps a clean seam — see docs/adr/0001-shell-formatter-distribution.md. Fixes a bug where format_python_string_literal received literals with a trailing newline from the parser (a NEWLINE token after the closing triple-quote), causing re.fullmatch to return the string unchanged and leaving shell blocks silently unformatted in all real snakefmt runs. Fix: rstrip the literal before regex matching. Adds a parametrised robustness suite covering all trailing-whitespace input shapes, and end-to-end behavioural tests through the public formatter and CLI. Closes #170 * docs: add ADR-0001, bugfix plan, and update README for shell formatting - docs/adr/README.md: index for architecture decision records - docs/adr/0001-shell-formatter-distribution.md: records the decision to keep shfmt-py rather than forking or building Go bindings, with concrete revisit triggers - docs/plans/shell-formatting-trailing-newline-bugfix.md: working plan for the trailing-newline bug identified and fixed in the previous commit - README: new Shell Block Formatting section with before/after example, opt-out instructions, placeholder masking and invalid-shell notes; updated TOC, --help block, configuration example (sort_directives, format_shell), pre-commit rev (v0.10.2 -> v1.1.0), and Recent Changes callout * feat(lua): expose format_shell option in Neovim plugin Adds format_shell = nil to config defaults. When set to false, --no-format-shell is passed to the snakefmt binary, allowing users to disable shell block formatting from their editor config without touching pyproject.toml. require("snakefmt").setup({ format_shell = false }) nil (default) passes no flag, deferring to snakefmt's own default (on). * fix: update format_shell_code to unpack 3-tuple from _mask_snakemake_vars After the UUID nonce refactor, _mask_snakemake_vars returns (masked, tokens, originals) but format_shell_code still unpacked a 2-tuple. Update the call site to match the new signature. * fix: address PR #295 review comments - C1: unified two-layer brace masking — {{...}} is now masked before single {var} placeholders, ensuring brace groups, parameter expansion, brace expansion, awk patterns, and f-string Snakemake vars all survive shfmt unchanged - B2: replace textwrap.indent with _indent_preserving_heredocs, which skips heredoc body and terminator lines so <<EOF terminators remain at column 0 as bash requires - T1/C2: add unit tests for _indent_preserving_heredocs, integration heredoc tests with concrete expected values, and a long-line no-spurious-wrap test confirming shfmt's flags are syntactic only - M1: move format_shell from imperative post-construction attribute into Formatter.__init__ (default False, matching sort_directives); update setup_formatter and all TestShellBlockFormatting tests to opt in explicitly with format_shell=True - S1: consolidate two re.fullmatch calls into a single compiled _TRIPLE_QUOTE_RE using "{3}|'{3} to avoid backslash escapes in raw strings that confuse black's parser - README: add "Brace groups" section documenting the {{...}} preservation trade-off and the fmt: off escape hatch * chore(deps): bump shfmt-py to >=4.0.0,<5.0.0 v4.0.0 bundles shfmt v3.13.1 (latest upstream), decouples the package version from the shfmt version, adds Renovate for automated future updates, and falls back to a system shfmt on unsupported platforms. Updates ADR-0001 to record that the maintenance concerns that prompted the original decision have been addressed. * fix: chain CalledProcessError when raising InvalidShell from shfmt Preserves the original exception context so tracebacks show the full cause (missing binary, permission error, etc.) rather than only the InvalidShell message. * fix(format): address PR review comments on shell formatting Tighten double-brace regex to match non-greedily. Broaden heredoc start regex to support custom delimiters like !EOF!. Update README to explain opaque mask design choice accurately. * chore: lint * feat: mask heredoc bodies before shfmt to support non-standard terminators shfmt requires heredoc terminators at column 0 (or leading tabs for <<-EOF). Snakemake shell blocks sometimes use escape-prefixed terminators like \n!EOF!, which shfmt rejects as unclosed heredocs. Pre-process masked code with _mask_heredocs before invoking shfmt: detect the user-intended terminator permissively (accepting whitespace and \X escape prefixes before the delimiter word), replace the body and terminator with a placeholder, let shfmt format the surrounding shell, then restore the original body and terminator verbatim via _unmask_heredocs. shfmt does not reformat heredoc body content anyway, so masking has no effect on formatting quality. Five new tests cover the reviewer's exact case, body preservation, multiple heredocs, surrounding shell formatting, and truly unclosed heredocs that should still raise InvalidShell. Also updates the README with a dedicated "Heredoc handling" section. * docs(readme): collapse verbose sections into expandable details blocks * fix: rename shadowed variable Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> * fix: preserve original literal when format_python_string_literal finds no triple-quote match rstrip was applied before the no-match early return, so single-line strings and other non-triple-quoted literals were returned trimmed rather than verbatim. Store the original before rstripping and return it on the no-match path. --------- BREAKING CHANGE: shell blocks in rules are now formatted by shfmt by default. Existing Snakefiles may see whitespace and structure changes in their shell directives. Use `--no-format-shell` or `format_shell = false` in `pyproject.toml` to opt out. Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
1 parent 6990f79 commit a8d131b

18 files changed

Lines changed: 1778 additions & 44 deletions

README.md

Lines changed: 198 additions & 39 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,7 @@ design and specifications of [Black][black].
2020
> **Recent Changes:**
2121
> 1. **Rule and module directives are now sorted by default:** `snakefmt` will automatically sort the order of directives inside rules (e.g. `input`, `output`, `shell`) and modules into a consistent order. You can opt out of this by using the `--no-sort` CLI flag.
2222
> 2. **Black upgraded to v26:** The underlying `black` formatter has been upgraded to v26. You will see changes in how implicitly concatenated strings are wrapped (they are now collapsed onto a single line if they fit within the line limit) and other minor adjustments compared to previous versions.
23+
> 3. **Shell blocks are now formatted using `shfmt`:** `snakefmt` now formats the body of `shell:` directives using [`shfmt`](https://github.com/mvdan/sh). This is enabled by default and will reformat shell code that was previously left untouched. You can opt out with `--no-format-shell` (`-F`) or `format_shell = false` in `pyproject.toml`. See [Shell Block Formatting](#shell-block-formatting) for details.
2324
>
2425
> **Example of expected differences:**
2526
> ```python
@@ -46,7 +47,7 @@ design and specifications of [Black][black].
4647
4748
[TOC]: #
4849
49-
# Table of Contents
50+
## Table of Contents
5051
- [Install](#install)
5152
- [PyPi](#pypi)
5253
- [Conda](#conda)
@@ -56,6 +57,7 @@ design and specifications of [Black][black].
5657
- [Usage](#usage)
5758
- [Basic Usage](#basic-usage)
5859
- [Full Usage](#full-usage)
60+
- [Shell Block Formatting](#shell-block-formatting)
5961
- [Directive Sorting](#directive-sorting)
6062
- [Format Directives](#format-directives)
6163
- [Configuration](#configuration)
@@ -233,6 +235,9 @@ snakefmt - < Snakefile
233235
234236
### Full Usage
235237
238+
<details>
239+
<summary>Show full help output</summary>
240+
236241
```
237242
$ snakefmt --help
238243
Usage: snakefmt [OPTIONS] [SRC]...
@@ -246,47 +251,56 @@ Usage: snakefmt [OPTIONS] [SRC]...
246251
Snakefile` to avoid this.
247252
248253
Options:
249-
-l, --line-length INT Lines longer than INT will be wrapped. [default:
250-
88]
251-
-s, --sort / -S, --no-sort Sort directives in rules and modules. [default:
252-
sort]
253-
--check Don't write the files back, just return the
254-
status. Return code 0 means nothing would
255-
change. Return code 1 means some files would be
256-
reformatted. Return code 123 means there was an
257-
error.
258-
-d, --diff Don't write the files back, just output a diff
259-
for each file to stdout.
260-
--compact-diff Same as --diff but only shows lines that would
261-
change plus a few lines of context.
262-
--include PATTERN A regular expression that matches files and
263-
directories that should be included on recursive
264-
searches. An empty value means all files are
265-
included regardless of the name. Use forward
266-
slashes for directories on all platforms
267-
(Windows, too). Exclusions are calculated
268-
first, inclusions later. [default:
269-
(\.smk$|^Snakefile)]
270-
--exclude PATTERN A regular expression that matches files and
271-
directories that should be excluded on recursive
272-
searches. An empty value means no paths are
273-
excluded. Use forward slashes for directories on
274-
all platforms (Windows, too). Exclusions are
275-
calculated first, inclusions later. [default: (
276-
\.snakemake/|\.eggs/|\.git/|\.hg/|\.mypy_cache/|
277-
\.nox/|\.tox/|\.venv/|\.svn/|_build/|buck-
278-
out/|/build/|/dist/|\.template/)]
279-
-c, --config PATH Read configuration from PATH. By default, will
280-
try to read from `./pyproject.toml`
281-
-h, --help Show this message and exit.
282-
-V, --version Show the version and exit.
283-
-v, --verbose Turns on debug-level logger.
254+
-l, --line-length INT Lines longer than INT will be wrapped.
255+
[default: 88]
256+
-s, --sort / -S, --no-sort Sort directives in rules and modules.
257+
[default: sort]
258+
-f, --format-shell / -F, --no-format-shell
259+
Format shell directives using shfmt.
260+
[default: format-shell]
261+
--check Don't write the files back, just return the
262+
status. Return code 0 means nothing would
263+
change. Return code 1 means some files would
264+
be reformatted. Return code 123 means there
265+
was an error.
266+
-d, --diff Don't write the files back, just output a
267+
diff for each file to stdout.
268+
--compact-diff Same as --diff but only shows lines that
269+
would change plus a few lines of context.
270+
--include PATTERN A regular expression that matches files and
271+
directories that should be included on
272+
recursive searches. An empty value means
273+
all files are included regardless of the
274+
name. Use forward slashes for directories
275+
on all platforms (Windows, too). Exclusions
276+
are calculated first, inclusions later.
277+
[default: (\.smk$|^Snakefile)]
278+
--exclude PATTERN A regular expression that matches files and
279+
directories that should be excluded on
280+
recursive searches. An empty value means no
281+
paths are excluded. Use forward slashes for
282+
directories on all platforms (Windows, too).
283+
Exclusions are calculated first, inclusions
284+
later. [default: (\.snakemake/|\.eggs/|\.gi
285+
t/|\.hg/|\.mypy_cache/|\.nox/|\.tox/|\.venv/
286+
|\.svn/|_build/|buck-
287+
out/|/build/|/dist/|\.template/)]
288+
-c, --config PATH Read configuration from PATH. By default,
289+
will try to read from `./pyproject.toml`
290+
-h, --help Show this message and exit.
291+
-V, --version Show the version and exit.
292+
-v, --verbose Turns on debug-level logger.
284293
```
285294
295+
</details>
296+
286297
### Directive Sorting
287298
288299
By default, `snakefmt` sorts rule and module directives (like `input`, `output`, `shell`, etc.) into a consistent order. This makes rules easier to read and allows for quicker cross-referencing between inputs, outputs, and the resources used by the execution command.
289300
301+
<details>
302+
<summary>Directive ordering details</summary>
303+
290304
Directives are grouped by their functional role in the following order:
291305
292306
1. **Identity & Early Control**: `name`, `default_target`
@@ -300,8 +314,129 @@ Directives are grouped by their functional role in the following order:
300314
301315
This ordering ensures that the directives most frequently used in execution blocks (like `threads`, `resources`, and `params`) are placed immediately above the action directive.
302316
317+
</details>
318+
303319
You can disable this feature using the `--no-sort` flag.
304320
321+
### Shell Block Formatting
322+
323+
By default, `snakefmt` formats the body of `shell:` directives using [`shfmt`](https://github.com/mvdan/sh).
324+
This keeps shell snippets in your Snakefiles formatted consistently and avoids cosmetic diffs triggering unnecessary Snakemake re-runs.
325+
326+
#### Example
327+
328+
Before:
329+
330+
```python
331+
rule align:
332+
input:
333+
"reads.fq",
334+
output:
335+
"aligned.bam",
336+
threads: 4
337+
shell:
338+
"""
339+
bwa mem -t {threads} ref.fa {input} | samtools sort -o {output} -
340+
if [ -s {output} ]
341+
then
342+
echo "done"
343+
else
344+
echo "empty"
345+
exit 1
346+
fi
347+
"""
348+
```
349+
350+
After:
351+
352+
```python
353+
rule align:
354+
input:
355+
"reads.fq",
356+
output:
357+
"aligned.bam",
358+
threads: 4
359+
shell:
360+
"""
361+
bwa mem -t {threads} ref.fa {input} | samtools sort -o {output} -
362+
if [ -s {output} ]; then
363+
echo "done"
364+
else
365+
echo "empty"
366+
exit 1
367+
fi
368+
"""
369+
```
370+
371+
#### Disabling
372+
373+
You can disable shell formatting on the command line with `--no-format-shell` (`-F`), or in `pyproject.toml`:
374+
375+
```toml
376+
[tool.snakefmt]
377+
format_shell = false
378+
```
379+
380+
`shfmt` is invoked with `-i 4 -ci -bn` (four-space indentation, indented switch cases, binary operators may start a line).
381+
382+
<details>
383+
<summary>Advanced details: placeholders, heredocs, brace groups, invalid shell</summary>
384+
385+
#### Snakemake placeholders
386+
387+
Snakemake `{var}` placeholders are masked before `shfmt` runs so it does not mis-parse them, then restored verbatim afterwards.
388+
Escaped double-brace placeholders such as those required by `awk` are passed through unchanged:
389+
390+
```python
391+
rule example:
392+
shell:
393+
"""
394+
awk '{{print $1}}' {input} > {output}
395+
"""
396+
```
397+
398+
#### Brace groups
399+
400+
Bash brace groups (`{ cmd1; cmd2; }`) appear as `{{ cmd1; cmd2; }}` in Snakemake shell strings
401+
because Snakemake renders the block through `str.format()`, which requires `{{` / `}}` to produce
402+
literal `{` / `}`. `snakefmt` preserves these double-brace sequences verbatim — the body inside
403+
a brace group is **not** internally reformatted by `shfmt`. This is an implementation trade-off:
404+
safely unescaping, formatting, and re-escaping the contents without disrupting Snakemake's
405+
variable interpolation introduces significant parser complexity, so opaque masking is used for
406+
simplicity and safety.
407+
408+
If you need `shfmt` to format the body of a brace group, wrap it in `# fmt: off` / `# fmt: on`
409+
and format that section manually.
410+
411+
#### Heredoc handling
412+
413+
`snakefmt` masks heredoc bodies before passing shell code to `shfmt`, and restores them afterwards. This means:
414+
415+
- **Heredoc bodies are never reformatted**`shfmt` does not reformat heredoc content, and neither does `snakefmt`.
416+
- **Snakemake-style escape prefixes on the terminator are supported.** For example, a heredoc that ends with `\n!EOF!` instead of a bare `!EOF!` at column 0 is detected and handled correctly — no `# fmt: off` required.
417+
418+
```
419+
shell:
420+
"""
421+
python <<!EOF!
422+
\nif True:
423+
print("hello")
424+
\n!EOF!
425+
"""
426+
```
427+
428+
Standard heredoc forms (`<<EOF`, `<<-EOF`, `<<'EOF'`) are also supported and the terminator placement requirement (column 0 for `<<EOF`, leading tabs only for `<<-EOF`) is preserved after formatting.
429+
430+
#### Invalid shell
431+
432+
If `shfmt` cannot parse the shell body, `snakefmt` raises an `InvalidShell` error rather than silently leaving the block unformatted.
433+
To work around genuinely invalid shell, either:
434+
435+
- Disable shell formatting for the whole run with `-F` / `--no-format-shell`, or
436+
- Wrap the rule in `# fmt: off` / `# fmt: on` directives (see below) to opt that block out.
437+
438+
</details>
439+
305440
### Format Directives
306441
307442
`snakefmt` supports comment directives to control formatting behaviour for specific regions of code.
@@ -334,6 +469,9 @@ rule c:
334469
335470
> **Note:** inside `run:` blocks and other Python contexts, `# fmt: off` / `# fmt: on` is passed through to [Black][black], which handles it natively.
336471
472+
<details>
473+
<summary>Additional directives: <code># fmt: off[sort]</code>, <code># fmt: off[next]</code>, <code># fmt: skip</code></summary>
474+
337475
#### `# fmt: off[sort]`
338476
339477
Disables directive sorting for the enclosed region while still applying all other formatting.
@@ -383,6 +521,8 @@ rule also_formatted:
383521
> **Note:** `# fmt: skip` is not yet supported within Snakemake rule blocks.
384522
> It currently applies only to plain Python lines outside of rules, checkpoints, and similar Snakemake constructs.
385523
524+
</details>
525+
386526
### Configuration
387527
388528
`snakefmt` is able to read project-specific default values for its command line options
@@ -405,6 +545,8 @@ configuration file.
405545
[tool.snakefmt]
406546
line_length = 90
407547
include = '\.smk$|^Snakefile|\.py$'
548+
sort_directives = true # sort rule directives into a consistent order (default: true)
549+
format_shell = true # format shell: blocks with shfmt (default: true)
408550
409551
# snakefmt passes these options on to black
410552
[tool.black]
@@ -433,7 +575,7 @@ To do so, create the file `.pre-commit-config.yaml` in the root of your project
433575
```yaml
434576
repos:
435577
- repo: https://github.com/snakemake/snakefmt
436-
rev: v0.10.2 # Replace by any tag/version ≥v0.6.0 : https://github.com/snakemake/snakefmt/releases
578+
rev: v1.1.0 # Replace by any tag/version ≥v0.6.0 : https://github.com/snakemake/snakefmt/releases
437579
hooks:
438580
- id: snakefmt
439581
```
@@ -444,7 +586,11 @@ Then [install pre-commit](https://pre-commit.com/#installation) and initialize t
444586
445587
[GitHub Actions](https://github.com/features/actions) in combination with [super-linter](https://github.com/github/super-linter) allows you to automatically run `snakefmt` on all Snakefiles in your repository e.g. whenever you push a new commit.
446588
447-
To do so, create the file `.github/workflows/linter.yml` in your repository:
589+
<details>
590+
<summary>Show GitHub Actions workflow configuration</summary>
591+
592+
Create `.github/workflows/linter.yml` in your repository:
593+
448594
```yaml
449595
---
450596
name: Lint
@@ -485,11 +631,14 @@ jobs:
485631
```
486632
487633
Additional configuration parameters can be specified by creating `.github/linters/.snakefmt.toml`:
634+
488635
```toml
489636
[tool.black]
490637
skip_string_normalization = true
491638
```
492639
640+
</details>
641+
493642
For more information check the `super-linter` readme.
494643
495644
## Plug Us
@@ -499,6 +648,9 @@ in your project.
499648
500649
[![Code style: snakefmt](https://img.shields.io/badge/code%20style-snakefmt-000000.svg)](https://github.com/snakemake/snakefmt)
501650
651+
<details>
652+
<summary>Copy badge markup</summary>
653+
502654
### Markdown
503655
504656
```md
@@ -512,6 +664,8 @@ in your project.
512664
:target: https://github.com/snakemake/snakefmt
513665
```
514666
667+
</details>
668+
515669
## Changes
516670
517671
See [`CHANGELOG.md`][changes].
@@ -524,7 +678,10 @@ See [CONTRIBUTING.md][contributing].
524678
525679
[![DOI][doi-shield]][doi]
526680
527-
```Bibtex
681+
<details>
682+
<summary>BibTeX</summary>
683+
684+
```bibtex
528685
@article{snakemake2021,
529686
doi = {10.12688/f1000research.29032.2},
530687
url = {https://doi.org/10.12688/f1000research.29032.2},
@@ -539,6 +696,8 @@ See [CONTRIBUTING.md][contributing].
539696
}
540697
```
541698
699+
</details>
700+
542701
543702
[snakemake]: https://snakemake.readthedocs.io/
544703
[black]: https://black.readthedocs.io/en/stable/

0 commit comments

Comments
 (0)