You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When demultiplexed with a read structure indicating presence of UMIs (including M; ex. 146T8B9M8B146T for a 9bp UMI), the resulting bam files include per-read UMI sequences via the RX tag. Our current picard-based demultiplexing handles UMIs as expected (the RX tags are present), but the ref-based assembly yields aligned bams with the RX tags conspicuously missing. We should figure at what stage the RX tags are initially lost and find a way to preserve them or re-annotate the reads in the output bam file.
Related to the above, we should have a UMI-aware ref-based assembly pipeline that makes use of picard's UmiAwareMarkDuplicatesWithMateCigar to deduplicate reads, taking UMIs into account. In this pipeline, the reads will need to be aligned to the reference once to determine alignment coordinates, then deduplicated via UmiAwareMarkDuplicatesWithMateCigar, then aligned again. The ultimate output should contain UMI and position-distinct aligned reads (while tolerating some level of mismatch in the UMIs).
This all is in support of tiled amplicon sequencing and iSNV analysis.
The text was updated successfully, but these errors were encountered:
When demultiplexed with a read structure indicating presence of UMIs (including
M
; ex.146T8B9M8B146T
for a 9bp UMI), the resulting bam files include per-read UMI sequences via the RX tag. Our current picard-based demultiplexing handles UMIs as expected (theRX
tags are present), but the ref-based assembly yields aligned bams with the RX tags conspicuously missing. We should figure at what stage the RX tags are initially lost and find a way to preserve them or re-annotate the reads in the output bam file.Related to the above, we should have a UMI-aware ref-based assembly pipeline that makes use of picard's UmiAwareMarkDuplicatesWithMateCigar to deduplicate reads, taking UMIs into account. In this pipeline, the reads will need to be aligned to the reference once to determine alignment coordinates, then deduplicated via
UmiAwareMarkDuplicatesWithMateCigar
, then aligned again. The ultimate output should contain UMI and position-distinct aligned reads (while tolerating some level of mismatch in the UMIs).This all is in support of tiled amplicon sequencing and iSNV analysis.
The text was updated successfully, but these errors were encountered: