feat!: format shell directives using shfmt#295
Conversation
Add shfmt-py as dependency to format Snakemake shell directives.
Mask variables like {input} to prevent shfmt mangling.
Ignore escaped braces like {{ }} during masking.
Extract and format multiline string literals safely.
Add --no-format-shell CLI flag to opt-out.
Shell_formatter internals are split into focused helpers
(_mask_snakemake_vars, _invoke_shfmt, _unmask_snakemake_vars) to give
future backend swaps a clean seam — see
docs/adr/0001-shell-formatter-distribution.md.
Fixes a bug where format_python_string_literal received literals with a
trailing newline from the parser (a NEWLINE token after the closing
triple-quote), causing re.fullmatch to return the string unchanged and
leaving shell blocks silently unformatted in all real snakefmt runs.
Fix: rstrip the literal before regex matching.
Adds a parametrised robustness suite covering all trailing-whitespace
input shapes, and end-to-end behavioural tests through the public
formatter and CLI.
Closes snakemake#170
- docs/adr/README.md: index for architecture decision records - docs/adr/0001-shell-formatter-distribution.md: records the decision to keep shfmt-py rather than forking or building Go bindings, with concrete revisit triggers - docs/plans/shell-formatting-trailing-newline-bugfix.md: working plan for the trailing-newline bug identified and fixed in the previous commit - README: new Shell Block Formatting section with before/after example, opt-out instructions, placeholder masking and invalid-shell notes; updated TOC, --help block, configuration example (sort_directives, format_shell), pre-commit rev (v0.10.2 -> v1.1.0), and Recent Changes callout
Adds format_shell = nil to config defaults. When set to false, --no-format-shell
is passed to the snakefmt binary, allowing users to disable shell block
formatting from their editor config without touching pyproject.toml.
require("snakefmt").setup({ format_shell = false })
nil (default) passes no flag, deferring to snakefmt's own default (on).
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds shfmt-driven formatting for Snakemake shell bodies with placeholder masking, heredoc-aware masking/indentation preservation, a new InvalidShell error path, Formatter/CLI/Lua plumbing to opt in/out, comprehensive tests, ADRs/plans, and README documentation updates. ChangesShell Block Formatting via shfmt
sequenceDiagram
participant PythonLiteral
participant MaskSnakemakeVars
participant MaskHeredocs
participant Shfmt
participant UnmaskHeredocs
participant UnmaskSnakemakeVars
participant FormatterOutput
PythonLiteral->>MaskSnakemakeVars: replace `{...}`/`{{...}}` with tokens
MaskSnakemakeVars->>MaskHeredocs: detect and replace heredoc bodies with placeholders
MaskHeredocs->>Shfmt: invoke shfmt subprocess with masked code
Shfmt->>UnmaskHeredocs: return formatted masked code
UnmaskHeredocs->>UnmaskSnakemakeVars: restore original heredoc bodies and tokens
UnmaskSnakemakeVars->>FormatterOutput: produce final formatted shell content
🎯 4 (Complex) | ⏱️ ~45 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #295 +/- ##
==========================================
+ Coverage 96.52% 96.59% +0.07%
==========================================
Files 12 13 +1
Lines 1497 1617 +120
Branches 309 324 +15
==========================================
+ Hits 1445 1562 +117
- Misses 24 25 +1
- Partials 28 30 +2
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@snakefmt/shell_formatter.py`:
- Around line 15-35: The mask template is globally fixed and can collide with
user text; change _mask_snakemake_vars to generate a per-call unique token
prefix (e.g., random/UUID-based string or monotonic nonce) and store the exact
tokens used in the returned originals tuple (or a separate tokens list), then
change _unmask_snakemake_vars to iterate in reverse and replace those exact
tokens with their originals using the stored tokens; update references to
_MASK_TEMPLATE in both _mask_snakemake_vars and _unmask_snakemake_vars (and the
similar code at lines ~56-58) so masking/unmasking uses the per-call tokens
rather than a global static format.
- Line 13: The current _SNAKEMAKE_VAR_PATTERN is too permissive and captures
arbitrary { ... } blocks (including shell constructs like fn(){...}); tighten it
so it only matches placeholders whose content begins with a letter or underscore
and consists solely of identifier-like characters (letters, digits, underscore
and limited punctuation such as dot/dash/colon) and explicitly disallow
characters used in shell blocks (parentheses, semicolons, whitespace, braces,
etc.), while preserving the existing negative lookarounds that prevent matching
double-brace sequences; update the _SNAKEMAKE_VAR_PATTERN definition accordingly
and adjust any tests that relied on the old behavior.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: e3e40cde-899c-44d4-bf9e-8fad742e65cc
⛔ Files ignored due to path filters (2)
pyproject.tomlis excluded by!pyproject.tomluv.lockis excluded by!**/*.lock
📒 Files selected for processing (13)
README.mddocs/adr/0001-shell-formatter-distribution.mddocs/adr/README.mddocs/plans/shell-formatting-trailing-newline-bugfix.mdlua/snakefmt/config.lualua/snakefmt/init.luasnakefmt/exceptions.pysnakefmt/formatter.pysnakefmt/shell_formatter.pysnakefmt/snakefmt.pytests/test_config.pytests/test_formatter.pytests/test_shell_formatter.py
…vars After the UUID nonce refactor, _mask_snakemake_vars returns (masked, tokens, originals) but format_shell_code still unpacked a 2-tuple. Update the call site to match the new signature.
|
@Hocnonsense it would be nice to get a review from you before merging this as it is a big feature addition and you and I are the most recently active members on this repository. |
Hocnonsense
left a comment
There was a problem hiding this comment.
Thanks for this contribution. The CLI/config plumbing looks solid, and the overall architecture is clean. I do have some concerns before this can merge, one of which is a blocker.
|
|
||
| The unformatted input uses an if/then/fi block that shfmt normalises to | ||
| `if ...; then` form — a visible, deterministic transformation we can assert on. | ||
| """ |
There was a problem hiding this comment.
-
More test about cases where too many/too few indents are necessary.
-
A basic heredoc to verify the terminator lands at column 0 after formatting.
-
A case with embedded
\nescape sequences and blank lines, e.g.:shell: """ python <<!EOF! \nif True: pass \n!EOF! """This covers shfmt's leave-heredoc-body-alone behaviour and confirms
textwrap.indent
does not corrupt the terminator or the blank lines.
There was a problem hiding this comment.
hopefully the new tests I've added cover these cases sufficiently for you?
- C1: unified two-layer brace masking — {{...}} is now masked before
single {var} placeholders, ensuring brace groups, parameter expansion,
brace expansion, awk patterns, and f-string Snakemake vars all survive
shfmt unchanged
- B2: replace textwrap.indent with _indent_preserving_heredocs, which
skips heredoc body and terminator lines so <<EOF terminators remain at
column 0 as bash requires
- T1/C2: add unit tests for _indent_preserving_heredocs, integration
heredoc tests with concrete expected values, and a long-line
no-spurious-wrap test confirming shfmt's flags are syntactic only
- M1: move format_shell from imperative post-construction attribute into
Formatter.__init__ (default False, matching sort_directives); update
setup_formatter and all TestShellBlockFormatting tests to opt in
explicitly with format_shell=True
- S1: consolidate two re.fullmatch calls into a single compiled
_TRIPLE_QUOTE_RE using "{3}|'{3} to avoid backslash escapes in raw
strings that confuse black's parser
- README: add "Brace groups" section documenting the {{...}} preservation
trade-off and the fmt: off escape hatch
v4.0.0 bundles shfmt v3.13.1 (latest upstream), decouples the package version from the shfmt version, adds Renovate for automated future updates, and falls back to a system shfmt on unsupported platforms. Updates ADR-0001 to record that the maintenance concerns that prompted the original decision have been addressed.
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (3)
tests/test_shell_formatter.py (2)
48-51:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winParametrized f-string test cases use incorrect syntax.
Lines 48-51 test string prefixes including
f,rb,B, andu, but all use{_UNFORMATTED_CONTENT}which contains the variable in single-brace form. For theprefix_fcase, this represents a Python f-string expression, not a Snakemake placeholder, and would fail at runtime if_UNFORMATTED_CONTENTisn't defined as a Python variable.The test at lines 150-157 correctly demonstrates f-string usage with
{{output}}, showing awareness of the issue.🔧 Suggested fix
Remove or update the f-string parametrized case:
(f'f"""\n{_UNFORMATTED_CONTENT}\n"""\n', f'f"""\n{_EXPECTED_CONTENT}"""'), - (f'rb"""\n{_UNFORMATTED_CONTENT}\n"""\n', f'rb"""\n{_EXPECTED_CONTENT}"""'), + (f'r"""\n{_UNFORMATTED_CONTENT}\n"""\n', f'r"""\n{_EXPECTED_CONTENT}"""'), (f'B"""\n{_UNFORMATTED_CONTENT}\n"""\n', f'B"""\n{_EXPECTED_CONTENT}"""'), (f'u"""\n{_UNFORMATTED_CONTENT}\n"""\n', f'u"""\n{_EXPECTED_CONTENT}"""'), ], ids=[ "trailing_newline", "multiple_trailing_newlines", "trailing_spaces_tabs", "crlf_line_endings", - "prefix_f", - "prefix_rb", + "prefix_r", "prefix_B", "prefix_u",Or replace the fixture with f-string-safe content (no Snakemake placeholders).
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/test_shell_formatter.py` around lines 48 - 51, The parametrized test cases use Python f-string prefixes consuming {_UNFORMATTED_CONTENT} (seen in the tuples with f'...{_UNFORMATTED_CONTENT}...'), which will be evaluated at test runtime; update the test in tests/test_shell_formatter.py to avoid evaluating Snakemake placeholders: either remove the f-prefixed tuple (the one using f'"""...{_UNFORMATTED_CONTENT}..."""') or replace its content with an f-string-safe fixture (e.g., use literal double-braces like '{{output}}' or plain text without {…} placeholders) so that _UNFORMATTED_CONTENT/_EXPECTED_CONTENT remain valid for the shell formatter assertions.
19-21:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winTest uses incorrect Snakemake syntax for f-strings.
This test uses
{input}inside an f-string shell block, but in f-strings,{input}is a Python expression (evaluated before the shell runs), not a Snakemake placeholder. The correct syntax for Snakemake variables in f-strings is{{input}}(which Python'sstr.format()collapses to{input}at DAG build time).This test validates the current implementation but tests incorrect behavior. Once the f-string handling issue in
shell_formatter.py(line 116-153) is addressed, this test should be updated or removed.🔧 Suggested fix
Either skip f-strings in the test:
def test_format_python_string_literal(): - literal = 'f"""\n echo {input}\n ls -l\n """' - expected = 'f"""\n echo {input}\n ls -l\n """' + literal = '"""\n echo {input}\n ls -l\n """' + expected = '"""\n echo {input}\n ls -l\n """' assert format_python_string_literal(literal, target_indent=2) == expectedOr use correct f-string syntax:
- literal = 'f"""\n echo {input}\n ls -l\n """' - expected = 'f"""\n echo {input}\n ls -l\n """' + literal = 'f"""\n echo {{input}}\n ls -l\n """' + expected = 'f"""\n echo {{input}}\n ls -l\n """'Based on learnings from past review: The parametrized test suite includes f-string cases using single-brace
{input}syntax, which is not valid for Snakemake shell blocks in f-strings.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/test_shell_formatter.py` around lines 19 - 21, The test in tests/test_shell_formatter.py uses an invalid Snakemake f-string placeholder `{input}`; update the test for format_python_string_literal to either skip f-string cases or use the correct Snakemake-safe f-string escaping `{{input}}` and adjust the expected string accordingly so the formatter is validated against correct syntax; refer to the f-string handling code in shell_formatter.py (the f-string handling block around lines 116-153) and the test helper format_python_string_literal to locate and update the failing case.snakefmt/shell_formatter.py (1)
116-153:⚠️ Potential issue | 🟠 Major | 🏗️ Heavy liftCritical: f-string shell blocks will be incorrectly formatted.
format_python_string_literalapplies the same{var}masking logic to all string prefixes, but f-strings have inverted brace semantics:
Syntax Plain/raw string f-string {input}Snakemake variable Python expression (evaluated before shell runs) {{input}}Literal {input}passed to shellSnakemake variable (Python expands {{}}→{})This means:
{input}in an f-string will be masked as a Snakemake variable, but it's actually a Python expression that should reach shfmt as-is (or error if the variable doesn't exist){{input}}is the correct way to write a Snakemake variable in an f-string, but the negative lookbehind in line 19 prevents matching itShell blocks in f-strings will be mangled unless users avoid f-strings entirely for shell directives.
🛡️ Suggested fix: Skip formatting for f-strings
def format_python_string_literal(literal: str, target_indent: int = 0) -> str: """Extracts shell code from a Python string literal, formats it, and re-wraps it. Only multiline triple-quoted strings are formatted. Single-line strings and any form that doesn't match a simple triple-quote pattern are returned unchanged. The literal is expected to arrive from `str(parameter)` — the caller guarantees no leading whitespace beyond what the parser appends (handled by `rstrip` below). """ # Strip trailing whitespace: the parser appends a NEWLINE token after the # closing triple-quote, so the literal arrives as '"""..."""\n'. The caller # already rstrips the result, so this is safe. literal = literal.rstrip() match = _TRIPLE_QUOTE_RE.fullmatch(literal) if not match: return literal # not a triple-quoted string literal; return unchanged prefix = match.group(1) + + # f-strings have inverted brace semantics; skip formatting to avoid mangling + if "f" in prefix.lower(): + return literal + quote = match.group(2) content = match.group(3)Based on learnings from past review: f-strings require
{{var}}for Snakemake placeholders, but the current masking pattern with negative lookbehind(?<!\{)explicitly excludes{{...}}, causing the formatter to produce invalid shell code for f-string shell blocks.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@snakefmt/shell_formatter.py` around lines 116 - 153, The function format_python_string_literal currently treats all triple-quoted strings the same, which mangles f-strings; change it to skip formatting when the string is an f-string: after extracting prefix = match.group(1) (and before calling format_shell_code), detect if the prefix contains an f or F (e.g., "f" in prefix.lower() or use any(c in prefix for c in "fF")) and if so return the original literal unchanged; keep the existing formatting path for non-f prefixes and leave _TRIPLE_QUOTE_RE and format_shell_code untouched.
🧹 Nitpick comments (2)
tests/test_shell_formatter.py (1)
110-110: 💤 Low valueRename
varsto avoid shadowing the builtin.The variable name
varsshadows Python's builtinvars()function. While unlikely to cause issues in this test context, it reduces clarity and triggers linter warnings.♻️ Suggested rename
- vars = [f"{{var{i}}}" for i in range(15)] - unformatted = "echo " + " ".join(vars) + "\n" - expected = "echo " + " ".join(vars) + "\n" + placeholders = [f"{{var{i}}}" for i in range(15)] + unformatted = "echo " + " ".join(placeholders) + "\n" + expected = "echo " + " ".join(placeholders) + "\n"🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/test_shell_formatter.py` at line 110, The test creates a list assigned to the name `vars` which shadows the builtin vars(); rename that variable (e.g., to `var_placeholders` or `vars_list`) wherever it is used in the test to avoid shadowing and linter warnings—update the list comprehension assignment (`vars = [f"{{var{i}}}" for i in range(15)]`) and all subsequent references to the new name (look for usages in the test function in tests/test_shell_formatter.py).snakefmt/shell_formatter.py (1)
56-56: ⚡ Quick winConsider explicit
strict=forzip()to catch logic errors.If
tokensandoriginalshave mismatched lengths due to a bug in masking/unmasking logic, the silent truncation could corrupt shell code. Addingstrict=True(Python 3.10+) makes the error explicit.♻️ Proposed addition
- for token, original in zip(reversed(tokens), reversed(originals)): + for token, original in zip(reversed(tokens), reversed(originals), strict=True): code = code.replace(token, original)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@snakefmt/shell_formatter.py` at line 56, The loop using zip(reversed(tokens), reversed(originals)) can silently truncate if tokens and originals lengths mismatch; update the call to zip with strict=True (i.e., zip(..., strict=True)) to raise an error on length mismatch so masking/unmasking bugs fail fast; change the loop in shell_formatter.py that iterates over reversed(tokens) and reversed(originals) to use zip(..., strict=True) and run tests to ensure compatibility with Python 3.10+.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@snakefmt/shell_formatter.py`:
- Line 73: The current exception handling raises InvalidShell while suppressing
the original CalledProcessError traceback; update the raise to chain the
original exception (use `raise InvalidShell(f"Invalid shell code. shfmt error:
{e.stderr}") from e`) so the original CalledProcessError and its context
(missing binary, permission, etc.) is preserved—locate the raise of InvalidShell
inside the shell formatting code that catches CalledProcessError (referencing
the variable `e`) and replace the `from None` suppression with `from e`.
---
Duplicate comments:
In `@snakefmt/shell_formatter.py`:
- Around line 116-153: The function format_python_string_literal currently
treats all triple-quoted strings the same, which mangles f-strings; change it to
skip formatting when the string is an f-string: after extracting prefix =
match.group(1) (and before calling format_shell_code), detect if the prefix
contains an f or F (e.g., "f" in prefix.lower() or use any(c in prefix for c in
"fF")) and if so return the original literal unchanged; keep the existing
formatting path for non-f prefixes and leave _TRIPLE_QUOTE_RE and
format_shell_code untouched.
In `@tests/test_shell_formatter.py`:
- Around line 48-51: The parametrized test cases use Python f-string prefixes
consuming {_UNFORMATTED_CONTENT} (seen in the tuples with
f'...{_UNFORMATTED_CONTENT}...'), which will be evaluated at test runtime;
update the test in tests/test_shell_formatter.py to avoid evaluating Snakemake
placeholders: either remove the f-prefixed tuple (the one using
f'"""...{_UNFORMATTED_CONTENT}..."""') or replace its content with an
f-string-safe fixture (e.g., use literal double-braces like '{{output}}' or
plain text without {…} placeholders) so that
_UNFORMATTED_CONTENT/_EXPECTED_CONTENT remain valid for the shell formatter
assertions.
- Around line 19-21: The test in tests/test_shell_formatter.py uses an invalid
Snakemake f-string placeholder `{input}`; update the test for
format_python_string_literal to either skip f-string cases or use the correct
Snakemake-safe f-string escaping `{{input}}` and adjust the expected string
accordingly so the formatter is validated against correct syntax; refer to the
f-string handling code in shell_formatter.py (the f-string handling block around
lines 116-153) and the test helper format_python_string_literal to locate and
update the failing case.
---
Nitpick comments:
In `@snakefmt/shell_formatter.py`:
- Line 56: The loop using zip(reversed(tokens), reversed(originals)) can
silently truncate if tokens and originals lengths mismatch; update the call to
zip with strict=True (i.e., zip(..., strict=True)) to raise an error on length
mismatch so masking/unmasking bugs fail fast; change the loop in
shell_formatter.py that iterates over reversed(tokens) and reversed(originals)
to use zip(..., strict=True) and run tests to ensure compatibility with Python
3.10+.
In `@tests/test_shell_formatter.py`:
- Line 110: The test creates a list assigned to the name `vars` which shadows
the builtin vars(); rename that variable (e.g., to `var_placeholders` or
`vars_list`) wherever it is used in the test to avoid shadowing and linter
warnings—update the list comprehension assignment (`vars = [f"{{var{i}}}" for i
in range(15)]`) and all subsequent references to the new name (look for usages
in the test function in tests/test_shell_formatter.py).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 2272da25-0e77-480e-afca-d0198bb6eae0
⛔ Files ignored due to path filters (2)
pyproject.tomlis excluded by!pyproject.tomluv.lockis excluded by!**/*.lock
📒 Files selected for processing (9)
README.mddocs/adr/0001-shell-formatter-distribution.mddocs/plans/pr-295-review-response.mdsnakefmt/formatter.pysnakefmt/shell_formatter.pysnakefmt/snakefmt.pytests/__init__.pytests/test_formatter.pytests/test_shell_formatter.py
✅ Files skipped from review due to trivial changes (1)
- docs/plans/pr-295-review-response.md
🚧 Files skipped from review as they are similar to previous changes (2)
- docs/adr/0001-shell-formatter-distribution.md
- tests/test_formatter.py
Preserves the original exception context so tracebacks show the full cause (missing binary, permission error, etc.) rather than only the InvalidShell message.
|
Thanks for the awesome review @Hocnonsense Feel free to go through the updates when you get a chance and let me know what you think. |
Hocnonsense
left a comment
There was a problem hiding this comment.
Thanks for the updates — the heredoc-aware indent and the two-layer {{...}} masking address the core structural issues from the first round. A few follow-up concerns remain; most are enhancements or edge cases rather than blockers, but there are two correctness gaps worth noting before merge.
|
|
||
| **Decision: mask all `{{...}}` patterns** uniformly across plain strings and f-strings. The formatter does not need to model runtime semantics — it preserves every brace-bracketed token verbatim. | ||
|
|
||
| **Trade-off (accepted):** bash brace groups `{{ cmd1; cmd2; }}` lose internal formatting because shfmt sees only an opaque token. This case is rare in Snakemake shell directives (`&&` chains and line continuations cover most pipelines). Users who need brace-group formatting can use `# fmt: off`. |
There was a problem hiding this comment.
Within the opaque-mask design the current ordering ({{...}} first, then {var}) is a valid implementation choice — fixing C1a is sufficient to make it work.
However, reversing the order unlocks a cleaner pipeline that allows shfmt to actually format brace group bodies:
- Mask Snakemake
{var}→ tokens — invariant: no bare{…}remain after this step - Unescape
{{→{and}}→}— safe, because any{…}here came from{{…}} - Run shfmt on the actual shell code (brace groups visible and formattable)
- Re-escape all
{→{{and}→}}— unambiguous for the same reason - Unmask tokens →
{var}
The invariant from step 1 makes steps 2 and 4 unambiguous without any position tracking. Crucially, the output still contains {{...}}, so str.format() is unaffected at runtime.
The README currently states:
reformatting the body would require splitting the
{{/}}delimiters, which would breakstr.format()at runtime.
This is incorrect — the pipeline above does not split the delimiters; it round-trips them through unescape → re-escape and the final output is identical in structure to the input. The real trade-off is implementation complexity, not a technical constraint.
If the simpler opaque-mask approach is retained, the README note should be corrected to reflect the actual reason (complexity) rather than the stated one (impossibility).
There was a problem hiding this comment.
README has been updated accordingly
|
|
||
| `_unmask_snakemake_vars` already does the right thing — iterates `(token, original)` pairs in reverse and substitutes. No signature change. | ||
|
|
||
| **F-string handling:** with this fix, f-strings are formatted normally. No special-case skip. The reviewer's underlying concern (wrong masking semantics for f-strings) is resolved by the unified two-layer mask. |
There was a problem hiding this comment.
This is only partially true. With {{...}} masked first, both {{output}} and {input} in an f-string are correctly masked as tokens ✓
However, f-string {...} pairing is determined by Python's parser, not by regex. Applying a single regex across both f-string expressions and Snakemake variables simultaneously risks incorrect matches when bracket pairs span both contexts — for example, {{func {params.a} and params: a= "balabala}}". Although rare, the regex cannot distinguish these without understanding Python syntax.
The safest fix remains skipping f-strings entirely: if "f" in prefix.lower(): return literal.
There was a problem hiding this comment.
I think this is addressed now with the new bracket masking strategy. We don't ignore f-strings, but we are more "aggressive" with the masking
Tighten double-brace regex to match non-greedily. Broaden heredoc start regex to support custom delimiters like !EOF!. Update README to explain opaque mask design choice accurately.
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
tests/test_shell_formatter.py (1)
110-110: ⚡ Quick winRename
varsto avoid shadowing the builtin.The variable name
varsshadows Python's builtinvars()function. Consider renaming tovar_listorvariables.♻️ Proposed fix
- vars = [f"{{var{i}}}" for i in range(15)] - unformatted = "echo " + " ".join(vars) + "\n" - expected = "echo " + " ".join(vars) + "\n" + var_list = [f"{{var{i}}}" for i in range(15)] + unformatted = "echo " + " ".join(var_list) + "\n" + expected = "echo " + " ".join(var_list) + "\n"🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/test_shell_formatter.py` at line 110, The test creates a list named `vars` which shadows the built-in vars() function; rename that variable (e.g., to `var_list` or `variables`) everywhere it is used in the test (the list comprehension that currently reads `vars = [f"{{var{i}}}" for i in range(15)]` and any subsequent references) so the builtin is not shadowed and intent is clearer.snakefmt/shell_formatter.py (1)
56-56: ⚡ Quick winConsider adding
strict=Truetozip()for defensive programming.Adding
strict=Truewould catch any bugs wheretokensandoriginalshave mismatched lengths, making debugging easier if the masking invariant is ever violated.♻️ Proposed fix
- for token, original in zip(reversed(tokens), reversed(originals)): + for token, original in zip(reversed(tokens), reversed(originals), strict=True): code = code.replace(token, original) return code🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@snakefmt/shell_formatter.py` at line 56, The loop using zip(reversed(tokens), reversed(originals)) should use strict=True to detect mismatched lengths early; change the iteration to zip(reversed(tokens), reversed(originals), strict=True) so any length invariant violations between tokens and originals raise immediately (refer to the reversed(tokens)/reversed(originals) zip in snakefmt/shell_formatter.py and the variables tokens and originals).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@tests/test_shell_formatter.py`:
- Line 240: The test's long string literal assigned to expected exceeds the
88-char limit; shorten the line by splitting the string into multiple shorter
string literals joined implicitly inside parentheses or using concatenation so
the assignment to expected (the long triple-quoted shell snippet) is broken
across lines and each line stays under 88 chars; update the test value in
tests/test_shell_formatter.py so the content is unchanged but the source line
lengths comply with flake8.
---
Nitpick comments:
In `@snakefmt/shell_formatter.py`:
- Line 56: The loop using zip(reversed(tokens), reversed(originals)) should use
strict=True to detect mismatched lengths early; change the iteration to
zip(reversed(tokens), reversed(originals), strict=True) so any length invariant
violations between tokens and originals raise immediately (refer to the
reversed(tokens)/reversed(originals) zip in snakefmt/shell_formatter.py and the
variables tokens and originals).
In `@tests/test_shell_formatter.py`:
- Line 110: The test creates a list named `vars` which shadows the built-in
vars() function; rename that variable (e.g., to `var_list` or `variables`)
everywhere it is used in the test (the list comprehension that currently reads
`vars = [f"{{var{i}}}" for i in range(15)]` and any subsequent references) so
the builtin is not shadowed and intent is clearer.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 88a4d78c-3e49-40eb-800d-f8c58d08c9d8
📒 Files selected for processing (3)
README.mdsnakefmt/shell_formatter.pytests/test_shell_formatter.py
|
As always, great review @Hocnonsense. I think I have addressed these valid edge cases. But if I have missed anything I am happy to continue iterating on fixes. I want this feature to be well tested before it is released into the wild. Get as pedantic as you want. |
Hocnonsense
left a comment
There was a problem hiding this comment.
Here is how snakefmt run on my snakefile, and I've seen some point:
- without
# fmt: off, shfmt can fail:$ uv run snakefmt -d gene_arg.smk subprocess.CalledProcessError: Command '['shfmt', '-i', '4', '-ci', '-bn']' returned non-zero exit status 1. The above exception was the direct cause of the following exception: snakefmt.exceptions.InvalidShell: Invalid shell code. shfmt error: <standard input>:18:16: unclosed here-document `!EOF!` - indent of the docstring changes (probably out of scope).
- code should be formatted to run full tests
$ uv run snakefmt -d gene_arg.smk
=====> Diff for gene_arg.smk <=====
rule fq_ARGs_OAP:
input:
r1="{any}_1.fq.gz",
r2="{any}_2.fq.gz",
output:
tsv=["{any}-args_oap" f"_{i}.tsv" for i in ["type", "subtype", "gene"]],
shadow:
"shallow"
conda:
"gene_arg.yaml"
threads: 8
params:
suffix="fq", # suffix should exclude prefix "." and can ignore suffix ".gz"
data_type=["type", "subtype", "gene"],
outfolder="{any}-args_oap",
# fmt: off
shell:
"""
rm -f smk-args_oap
mkdir smk-args_oap/{{input,output}} -p
ln -rs {input} smk-args_oap/input
args_oap stage_one \
-i smk-args_oap/input -o smk-args_oap/output \
-f {params.suffix} -t {threads}
args_oap stage_two \
-i smk-args_oap/output -t {threads}
if [ -f {params.outfolder} ]; then
mv {params.outfolder} smk-args_oap/old_output
fi
mv smk-args_oap/output {params.outfolder}
python <<!EOF!
\nif True:
from pathlib import Path
import pandas as pd
txt_files = (
pd.Series(Path("{params.outfolder}").rglob("*.txt"))
.rename("path")
.to_frame()
.assign(
Name=lambda df: df["path"].apply(lambda p: p.name),
Index=lambda df: df["Name"].apply(lambda p: p.rsplit(".", 2)[0]),
Type=lambda df: df["Name"].apply(lambda p: p.rsplit(".", 2)[1]),
)
)
out_types = "{params.data_type}".split()
metadata = (
txt_files.pipe(lambda df: df[df["Type"].apply(lambda t: t in out_types)])
.pivot_table(values="path", index="Index", columns="Type", aggfunc=set)
.explode(out_types)
.dropna()
)
for data_type, outfile in zip(out_types, "{output.tsv}".split()):
concat: pd.DataFrame = pd.concat(
metadata[data_type]
.apply(lambda x: pd.read_csv(x, sep="\t").set_index(data_type))
.to_dict()
)
norm = (
concat.reset_index()
.melt(id_vars=["level_0", data_type], var_name="Sample")
.pivot_table(
values="value", index=[data_type, "Sample"], columns="level_0"
)
)
norm.to_csv(outfile, sep="\t")
\n!EOF!
"""
rule faa_rgi:
"""
- [home page](https://card.mcmaster.ca/)
? ----
+ [home page](https://card.mcmaster.ca/)
- this software can also annotate from contig, but it should call prodigal to predict itself
? ----
+ this software can also annotate from contig, but it should call prodigal to predict itself
- """
+
+ """
input:
faa="{prefix}.faa",
output:
tsv="{prefix}-rgi_prot.tsv",
shadow:
"shallow"
conda:
"rgi" # mamba create -n rgi --channel conda-forge --channel bioconda --channel defaults rgi
threads: 8
shell:
"""
rm -f smk-rgi
mkdir -p smk-rgi/
rgi main \
-i {input.faa} -o smk-rgi/output \
-t protein \
--clean --num_threads {threads}
"""
rule download_ISFinder:
output:
faa="{database}/ISfinder-sequences/IS.faa",
fna="{database}/ISfinder-sequences/IS.fna",
csv="{database}/ISfinder-sequences/IS.csv",
conda:
"gene_arg.yaml"
params:
outdir="{database}",
reposity="https://github.com/thanhleviet",
shell:
"""
mkdir {params.outdir} -p
cd {params.outdir}
/bin/rm ISfinder-sequences -r
git clone {params.reposity}/ISfinder-sequences
"""
rule download_VFDB_fna:
output:
fas="{database}/VFDB/VFDB_core.fna",
conda:
"gene_arg.yaml"
params:
outdir="{database}",
homepage="http://www.mgc.ac.cn/VFs",
file="VFDB_setA_nt.fas.gz",
shell:
"""
(
- cd `dirname {output.fas}`
? ^ ^
+ cd $(dirname {output.fas})
? ^^ ^
curl {params.homepage}/Down/{params.file} \
- |gunzip -c \
+ | gunzip -c \
? ++++ +
- > `basename {output.fas}`
? ^^ ^
+ >$(basename {output.fas})
? ++++++++ ^^ ^
)
"""
[INFO] All done 🎉
…ators shfmt requires heredoc terminators at column 0 (or leading tabs for <<-EOF). Snakemake shell blocks sometimes use escape-prefixed terminators like \n!EOF!, which shfmt rejects as unclosed heredocs. Pre-process masked code with _mask_heredocs before invoking shfmt: detect the user-intended terminator permissively (accepting whitespace and \X escape prefixes before the delimiter word), replace the body and terminator with a placeholder, let shfmt format the surrounding shell, then restore the original body and terminator verbatim via _unmask_heredocs. shfmt does not reformat heredoc body content anyway, so masking has no effect on formatting quality. Five new tests cover the reviewer's exact case, body preservation, multiple heredocs, surrounding shell formatting, and truly unclosed heredocs that should still raise InvalidShell. Also updates the README with a dedicated "Heredoc handling" section.
|
@Hocnonsense, I have tested your exact example locally and it seems to work all okay now. Test it out and see what you think. I also added some extra new tests with your case, body preservation, multiple heredocs, surrounding shell formatting, and the unclosed-heredoc error path. |
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@docs/plans/pr-295-heredoc-masking.md`:
- Around line 11-19: The Markdown uses bare fenced code blocks that trigger
MD040; update the first exception block containing
"snakefmt.exceptions.InvalidShell: Invalid shell code. shfmt error:" to use a
language identifier (e.g., replace the opening ``` with ```text) and update the
second block that starts with "shell:" and contains the heredoc/`python <<!EOF!`
snippet to use a language identifier (e.g., replace the opening ``` with
```python) so both fenced blocks include explicit languages.
In `@README.md`:
- Around line 418-426: The code block showing the heredoc for the shell/python
example uses an unlabeled fenced block which triggers markdownlint MD040; update
the opening fence to include the language by changing the triple backticks to
use "python" (i.e., replace ``` with ```python) for the block that contains the
"shell:" heredoc and the "python <<!EOF!" content so the block is properly
language-labeled.
In `@snakefmt/shell_formatter.py`:
- Around line 194-199: The code currently calls literal.rstrip() before doing
_TRIPLE_QUOTE_RE.fullmatch, so the no-match branch returns the trimmed value
instead of the original; preserve the exact original input by either performing
the regex match on the unmodified literal (move the rstrip after matching) or
store the original in a temp (e.g., original = literal) then use trimmed =
literal.rstrip() for matching/processing and return original when match is None;
update the logic around literal, match, and _TRIPLE_QUOTE_RE.fullmatch
accordingly.
In `@tests/test_shell_formatter.py`:
- Around line 108-113: The test function test_format_shell_code_many_variables
shadows the builtin vars by using the variable name vars; rename that variable
(e.g., to variables or vars_list) and update all uses within the function (the
list comprehension, unformatted, and expected constructions) to the new name so
Ruff A001 is resolved and behavior remains unchanged.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 5bccfc87-7df3-4521-ac98-dc6339ac8e45
📒 Files selected for processing (4)
README.mddocs/plans/pr-295-heredoc-masking.mdsnakefmt/shell_formatter.pytests/test_shell_formatter.py
Hocnonsense
left a comment
There was a problem hiding this comment.
Wow it's indeed a really nice solution. LGTM!
Thanks for your great effort!
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
…s no triple-quote match rstrip was applied before the no-match early return, so single-line strings and other non-triple-quoted literals were returned trimmed rather than verbatim. Store the original before rstripping and return it on the no-match path.
Great! I'll wait and see if any of the others want to review. If not, I'll push this sometime next week. |
Summary
shell:directives usingshfmt, enabled by default (-f/--format-shell, opt-out with-F/--no-format-shellorformat_shell = falseinpyproject.toml){var}placeholders are masked before formatting and restored verbatim; escaped{{...}}braces are passed through unchangedInvalidShellrather than crashing silently; escape hatches are-For# fmt: offshfmt-py— no separate user install required; distribution decision documented in ADR-0001format_python_string_literalsilently no-oped in real runs due to a trailing newline appended by the parser after the closing triple-quoteformat_shelloption in the Neovim plugin (require("snakefmt").setup({ format_shell = false }))Closes #170
Test plan
tests/test_shell_formatter.pycovering all trailing-whitespace input shapes (trailing\n, multiple\n, spaces/tabs, CRLF, string prefix variantsf,rb,B,u)tests/test_formatter.py::TestShellBlockFormatting: default-on formats,format_shell=Falsedisables,# fmt: offopts out,{var}/{{escaped}}placeholders survive the full pipelinetests/test_config.py::TestShellFormattingConfig:format_shell = false/truein TOML via CLI--configuv run pytest— 288 passed, 1 xfailedSummary by CodeRabbit
New Features
Behavioral Improvements
Documentation
Tests