Replies: 5 comments
-
If I'm not mistaken, it passes over the file twice. first pass it collects the information it needs to decide on the duplicates and representatives...and then on the second pass it marks and emits the reads. |
Beta Was this translation helpful? Give feedback.
-
hi @yfarjoun , thanks for the response, I looked over the code before and I think it wasn't a two-pass algorithm. could you point me to the piece of code that does the two passes? thanks! |
Beta Was this translation helpful? Give feedback.
-
First, on picard/sam/markduplicates/MarkDuplicates.java:270 it calls Then in the loop controlled by picard/sam/markduplicates/MarkDuplicates.java:338 the iterator (opened in picard.sam.markduplicates.util.AbstractMarkDuplicatesCommandLineProgram.SamHeaderAndIterator) is consumed and the reads marked using the information collected in the first pass. |
Beta Was this translation helpful? Give feedback.
-
right, that's for |
Beta Was this translation helpful? Give feedback.
-
I just tested it and confirmed that it's a bug. Here's the test SAM file:
Running it as follows: picard UmiAwareMarkDuplicatesWithMateCigar --INPUT test.sam --OUTPUT test_md.bam --METRICS_FILE metrics.txt --UMI_METRICS_FILE umi_metrics.txt Results:
|
Beta Was this translation helpful? Give feedback.
-
I saw that
DuplicateScoringStrategy
allowsSUM_OF_BASE_QUALITIES
to be set as scoring strategy. However, I am confused because it only takes the base qualities for the record in question and ignores the mate. Wouldn't that mean that for paired-end reads the read chosen in theDuplicateSet
may be different from the read chosen in the mate'sDuplicateSet
? I'm asking becauseUmiAwareMarkDuplicatesWithMateCigar
uses this scoring strategy (inherited fromMarkDuplicates
). Hope someone could clarify! Thanks@nh13 @jacarey
Beta Was this translation helpful? Give feedback.
All reactions