Skip to content

fix(trimgalore): clean stale outputs on same-workdir retry#11308

Open
pinin4fjords wants to merge 3 commits intonf-core:masterfrom
pinin4fjords:pinin4fjords/trimgalore-paired-cleanup
Open

fix(trimgalore): clean stale outputs on same-workdir retry#11308
pinin4fjords wants to merge 3 commits intonf-core:masterfrom
pinin4fjords:pinin4fjords/trimgalore-paired-cleanup

Conversation

@pinin4fjords
Copy link
Copy Markdown
Member

@pinin4fjords pinin4fjords commented Apr 25, 2026

Summary

Three changes to the trimgalore module:

1. Stale-output cleanup at the top of the script block (the actual fix)

Add to both SE and PE branches:

rm -f *.fq.gz *.html *.zip *_trimming_report.txt

Makes the module idempotent on a same-workdir retry. Pattern is exhaustive over trim_galore's emit globs and cannot match input symlinks (those are *.fastq.gz, not *.fq.gz).

2. Pin cutadapt in environment.yml

bioconda::trim-galore=0.6.10 does not bound cutadapt, so the conda solver picks the latest. Add bioconda::cutadapt=5.2 to match the docker container build (quay.io/biocontainers/trim-galore:0.6.10--hdfd78af_2). Same pattern as the nf-core/cutadapt module's environment.yml.

3. Filter the python-version line out of the snapshotted log

The trim_galore log contains This is cutadapt 5.2 with Python X.Y.Z. The X.Y.Z is whatever python conda happens to resolve, which is not something the module is responsible for. Pinning python in environment.yml just to satisfy the snapshot is brittle (every patch release would need an env bump). Instead, filter that line out via findAll { !it.startsWith("This is cutadapt") } and drop it from the snapshot. Cutadapt version is still asserted via the separate Cutadapt version: 5.2 line in the log header. This matches the strategy used by the nf-core/cutadapt module's tests, where the comment is // python versions differ in the default conda env and container.

Why (1) is needed

When a job retries in the same workdir as a partially-completed previous attempt, an intermediate <prefix>_1_trimmed.fq.gz from the failed attempt can survive into the successful retry. The reads output glob *{3prime,5prime,trimmed,val}{,_1,_2}.fq.gz matches all three resulting files (_trimmed, _val_1, _val_2), and downstream consumers expecting 1-2 fastqs (notably fq/lint after #11227 added arity) fail.

Reported via nf-core/rnaseq users on 3.23.0+.

Related

Notes for reviewers

  • Fix applied to both SE and PE branches symmetrically; SE is exposed to the same orphan risk in principle.
  • No emit-glob change.
  • Snapshot delta is just the removal of the This is cutadapt ... Python ... line in five places.
  • Tested locally with docker; CI matrix is green on this branch.

🤖 Generated with Claude Code

trim_galore overwrites its outputs on re-run but never deletes orphans
from a prior interrupted attempt. When AWS Batch retries a job in the
same workdir after a Spot reclaim, an intermediate `*_trimmed.fq.gz`
written by the failed attempt can survive into the successful retry,
get matched by the `reads` output glob, and break downstream consumers
that expect 1-2 fastq inputs (e.g. fq/lint).

Reported via nf-core/rnaseq users running 3.23.0+ on Spot+Fusion. See
also nf-core/rnaseq#1807.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@pinin4fjords pinin4fjords marked this pull request as ready for review April 25, 2026 20:26
pinin4fjords and others added 2 commits April 25, 2026 23:21
The docker container `quay.io/biocontainers/trim-galore:0.6.10--hdfd78af_2`
ships with cutadapt 5.2 and Python 3.12.12. Without explicit pins, the
conda solver picks newer versions (currently Python 3.13.13), which
desyncs from the docker container and breaks the snapshot test that
captures the trim_galore log line `This is cutadapt 5.2 with Python X.Y.Z`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The cutadapt log line "This is cutadapt 5.2 with Python X.Y.Z" includes
the runtime Python version, which is a function of whatever conda
resolves at solve time and not something the module is responsible
for. Pinning Python in environment.yml just to satisfy the snapshot
is brittle - every patch release would need an env bump.

Filter that line out of the snapshotted log chunks instead. Cutadapt
version is still asserted via the separate "Cutadapt version: 5.2"
header line, and cutadapt itself remains pinned in environment.yml
because it actually drives trimming behavior.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant