Skip to content

New module: modkit/extract/full#11285

Open
sahuno wants to merge 1 commit intonf-core:masterfrom
sahuno:add-modkit-extract-full
Open

New module: modkit/extract/full#11285
sahuno wants to merge 1 commit intonf-core:masterfrom
sahuno:add-modkit-extract-full

Conversation

@sahuno
Copy link
Copy Markdown

@sahuno sahuno commented Apr 24, 2026

PR checklist

  • This comment contains a description of changes (with reason).
  • Stub + real tests added.
  • New tool follows module conventions.
  • Versions broadcast via `topic: versions`.
  • Naming/parameter/I-O conventions followed.
  • Resource label set (`process_high`).
  • BioConda + BioContainers used.
  • `nf-core modules lint modkit/extract/full` — 49/0/0.
  • `nf-test test --profile conda` — 2/2 passed.

Summary

Adds a new nf-core module wrapping `modkit extract full`, which transforms the MM/ML tags in a modBAM into a tab-separated per-read-per-position probability table. Emits one row for every modified-base probability call in every read.

The module auto-detects `--bgzf` in `ext.args` and adjusts the output filename suffix accordingly (`.tsv` vs `.tsv.gz`), so users don't get a misleading extension when enabling compression.

Why

`modkit extract full` is the source of truth for read-level methylation probabilities and is essential for custom downstream filtering, phased methylation plots, and ML training on raw probability distributions. Paired with `modkit extract calls` (companion PR) which emits thresholded categorical calls.

Test data

Uses the existing `test.sorted.phased.bam` from nf-core/test-datasets (modules branch). No new test data required.

🤖 Generated with Claude Code

Add new nf-core module wrapping `modkit extract full`, which transforms
the MM/ML tags in a modBAM into a tab-separated per-read-per-position
probability table. Output can be BGZF-compressed via `--bgzf` in
`ext.args`. Useful for downstream custom filtering, plotting, and ML
training on read-level methylation probabilities.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@sahuno sahuno force-pushed the add-modkit-extract-full branch from 3c8814f to 46887a7 Compare April 24, 2026 02:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant