Dorado for basecalling vs modification detection #354

baibhav-bioinfo · 2025-01-27T18:48:31Z

Hello everyone,
I am new to nanopore DRS dataset, and just figuring out the different file formats and tools.

"Dorado basecaller" can be used to actual basecall sequence reads from raw pod5 files out of nanopore sequencing machines.

also "Dorado basecaller" can be used for detecting modifications in sequences using some extra command arguments.

As both commands produce "calls.bam" files, i wanted to know what is the difference between these two command outputs?
Is the difference is only between the presence or absence of modification information?
if i convert the bam into fastq files for getting actual sequence reads, will they both be same?

marcus1487 · 2025-01-27T19:01:33Z

Modified bases are output as a set of BAM/SAM tags. Details about these tags and how to run Dorado for modified base detection can be found in the Dorado documentation.

Here are the key points to address your question:

Canonical Basecalling vs. Modified Base Detection: Modified base calls are generated after canonical basecalling is complete, meaning the sequence field in the BAM file will be identical in both cases.
FASTQ Conversion: The FASTQ format does not natively support BAM/SAM tags, so modified base calls will be lost during conversion. However, you can preserve the tags in the FASTQ file using samtools fastq -T "*" command. Note that downstream tool support for such files depends on the tool in question.
Recommendation: Whenever possible, we recommend using tools within the Dorado ecosystem that can directly process BAM files with modified base calls.

baibhav-bioinfo · 2025-01-27T23:12:45Z

i have ran the "modified basecalling" as I am conducting m6A analysis.

So, if i want the fastq files of just sequences (for other analysis like DEGs etc.), can i use the bam files i got from the same run or do i have to run the "dorado canonical basecalling" separately?
Also, how do the dorado basecalling behaves with the polyA tails in end of each read? I want to keep the PolyA tail entirely as it was in the DRS read or remove whole thing. I mean i dont want the polyA tails to be basecalled partially. Is there any way to do that?
Are there any papers which have used dorado for m6A calling, which i can use as a template for my study? please let me know

marcus1487 added the question Looking for clarification on inputs and/or outputs label Jan 27, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dorado for basecalling vs modification detection #354

Dorado for basecalling vs modification detection #354

baibhav-bioinfo commented Jan 27, 2025

marcus1487 commented Jan 27, 2025

baibhav-bioinfo commented Jan 27, 2025 •

edited

Loading

Dorado for basecalling vs modification detection #354

Dorado for basecalling vs modification detection #354

Comments

baibhav-bioinfo commented Jan 27, 2025

marcus1487 commented Jan 27, 2025

baibhav-bioinfo commented Jan 27, 2025 • edited Loading

baibhav-bioinfo commented Jan 27, 2025 •

edited

Loading