Question: Does two reads with opposite strand with same mapping location regarded as duplicates? #67

zztin · 2023-07-05T09:43:00Z

Hi Andrew,

I have a nanopore sequencing library where one molecule could result in reads on both forward or reverse strands. I would like to know if preseq regard these pairs as "distinct" observation, or as "repeated" observation?

The definition of "distinct" observation is not very clearly stated in the documentation. I was trying to find it in the implementation:

Related lines:

preseq/src/load_data_for_complexity.cpp

Line 90 in c0fc953

curr_gr.get_start() != prev_gr.get_start())

Related class: GenomicRegion.hpp

It seems like get_start() is getting the smaller coordinate of a bam read?
If so, does it mean that both reverse and forward-mapped reads would be seen as the same molecule (same 'start') -- This is the desired behavior in my use case.

I would love to hear from you!

The command I used:

./preseq lc_extrap -B -o ./yield-estimates.txt <input-sorted-bam>

Software version:
v2.0 downloaded precompiled binary

Best,
Li-Ting

The text was updated successfully, but these errors were encountered:

andrewdavidsmith · 2023-07-06T02:02:14Z

@zztin I think you are probably right about the meaning of that code. Frankly, it's a weak criterion, but it should work for generating big-picture complexity results. It would not be a great approach if the purpose were, e.g., to accurately identify SNVs. It might lose information that would help at some important locus. Also, I've only ever considered how this should work for short reads.

Unfortunately I don't think I can answer your question now and I will leave this issue open, but here's one thing that might help: the best way to use Preseq is by providing a counts histogram (or the counts themselves). That would allow you to count duplicates according to your own definition.

I'll try go address this in more detail soon.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question: Does two reads with opposite strand with same mapping location regarded as duplicates? #67

Question: Does two reads with opposite strand with same mapping location regarded as duplicates? #67

zztin commented Jul 5, 2023

andrewdavidsmith commented Jul 6, 2023

Question: Does two reads with opposite strand with same mapping location regarded as duplicates? #67

Question: Does two reads with opposite strand with same mapping location regarded as duplicates? #67

Comments

zztin commented Jul 5, 2023

andrewdavidsmith commented Jul 6, 2023