Skip to content

feat: add dorado/aligner module#11222

Draft
sahuno wants to merge 1 commit intonf-core:masterfrom
sahuno:add-dorado-aligner
Draft

feat: add dorado/aligner module#11222
sahuno wants to merge 1 commit intonf-core:masterfrom
sahuno:add-dorado-aligner

Conversation

@sahuno
Copy link
Copy Markdown

@sahuno sahuno commented Apr 18, 2026

Description

Adds dorado/aligner — a wrapper around Oxford Nanopore's dorado aligner, which uses minimap2 under the hood to align unaligned ONT BAMs (e.g. produced by dorado/basecaller) while preserving modified base tags (MM/ML) and other BAM auxiliary tags.

CPU-only (no GPU required) — dorado aligner wraps minimap2 which is CPU-native. GPU is only needed for basecalling.

Why a separate module from dorado/basecaller?

Follows nf-core convention of one tool-subcommand per module. This is also the approach recommended by @Kevin-Brockers / @dialvarezs in #11122 (dorado/basecaller review). Keeping basecall and align separate lets users:

  • Use different CPU/memory profiles (basecall needs GPU, aligner doesn't)
  • Swap the aligner for minimap2/align if preferred
  • Resume the pipeline from the aligner step without re-basecalling

Test data

Test paths depend on nf-core/test-datasets#1969 (unaligned HG002 GIAB 10-read BAM). Tests were verified locally against the same files before PR; CI will go green once #1969 merges.

Verified real-test output:

[info] Running: "aligner" "--threads" "2" "genome.fasta" "HG002_PAW70337_giab_10reads.unaligned.bam"
[info] > Reads written: 10
[info] > total/primary/unmapped 21/6/4
[info] > Finished in (ms): 107

Snapshot strategy

Real-test snapshot intentionally captures only filename + dorado version (not BAM MD5), because dorado aligner embeds absolute paths in @SQ UR: and @PG CL: BAM header lines — these vary between test environments. Stub-test snapshot uses full process.out (stable because all files are empty touch'd).

PR checklist

  • Description of changes (above)
  • New tool — followed module conventions (main.nf, meta.yml, environment.yml, tests/)
  • Test data — uses files from Add ONT BAM and bedMethyl test data: HG002 GIAB 10-read subset (PAW70337) test-datasets#1969
  • No TODO statements
  • Versions broadcast via topic: versions
  • Naming conventions followed
  • Input/output options per guidelines
  • Resource label: process_high (CPU/RAM scales with input)
  • Container: docker.io/nanoporetech/dorado:shac8f356489fa8b44b31beba841b84d2879de2088e (ONTPL license — not on bioconda; vendor container pattern matches nf-core/parabricks)
  • nf-core modules test dorado/aligner --profile singularity
  • nf-core modules lint dorado/aligner ✅ (2 expected warnings re: vendor Docker Hub image, same as dorado/basecaller)

Conda and Docker profile tests not yet run (dorado not on bioconda; Docker not available on my dev host — expecting CI to validate).

🤖 Generated with Claude Code

Wraps `dorado aligner` (minimap2) to align unaligned ONT BAMs (e.g. from
dorado/basecaller) while preserving modification tags (MM/ML). CPU-only —
no GPU required.

- main.nf: DORADO_ALIGNER process, process_high label, vendor container
- meta.yml: EDAM ontologies for BAM/FASTA/FAI/TSV inputs and outputs
- tests: stub + real test against GIAB HG002 unaligned BAM and hg38 slice
  (snapshot reduced to stable fields to avoid BAM header path drift)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant