You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Jul 17, 2023. It is now read-only.
Copy file name to clipboardExpand all lines: CHANGELOG.md
+21Lines changed: 21 additions & 0 deletions
Original file line number
Diff line number
Diff line change
@@ -4,6 +4,27 @@
4
4
- Add a configurable option to allow overlapping pairs to be used as evidence (MANTA-1398)
5
5
- The option is available in the configure file configureManta.py.ini
6
6
7
+
### Changed
8
+
- Change SV candidate contig aligners to improve precision (MANTA-1396)
9
+
- Change contig aligners such that variant occurrences are more heavily penalized.
10
+
- Fix multi-junction nomination (MANTA-1430)
11
+
- Complex events with more than two junctions are no longer nominated as a group
12
+
- Fix the problem of duplicate detection of the same SV candidate
13
+
- Add index to ensure uniqueness of evidence bam filenames (MANTA-1431)
14
+
- It solves the potential problem of name conflicts for evidence bams if the input bam files have the same name while located in different directories.
15
+
- Change filters for easy interpretation of multi-sample germline variant vcf (MANTA-1343)
16
+
- Add record-level filter 'SampleFT' when no sample passes all sample level filters
17
+
- Add sample-level filter 'HomRef' for homogyzous reference calls
18
+
- No more sample-level filter will be applied at the record level even if it applies to all samples
19
+
- Change representation of inversions in the VCF output (MANTA-1385)
20
+
- Intrachromosomal translocations with inverted breakpoints are now reported as two breakend (BND) records.
21
+
- Previously they were reported in the VCF using the inversion (INV) allele type.
22
+
23
+
### Fixed
24
+
- Fix the bug of stats generation with short reference sequences (MANTA-1459/[#143])
25
+
- Fix the evidence significance test in the multi-sample calling mode (MANTA-1294)
26
+
- This issue previously caused spurious false negatives during the multi-sample calling mode. The incidence rate of the problem tended to increase with sample count.
27
+
7
28
## v1.4.0 - 2018-04-25
8
29
9
30
This is a major bugfix update from v1.3.2, featuring improved precision and vcf representation, in addition to minor user friendly improvements.
Finally, a greedy procedure is applied to select the constructed contigs in the order of the number of effective supporting reads and contig length. An effective supporting read cannot be a psuedo read, nor support any contigs that have been selected previously. The selection process is repeated until there is no more contig available with the minimum number of effective supporting reads (defaults to 2), or the maximum number of assembled contigs (defaults to 10) is met.
352
352
353
-
\subsubsection{Contig alignment for large SVs} For large SV candidates spanning two distinct regions of the genome, the reference sequences are extracted from the two expected breakend regions, and the order and/or orientation of the references is adjusted such that if the candidate SV exists, the left-most segment of the SV contig should align to the first transformed reference region and the right-most contig segment should align to the second reference region. The contig is aligned across the two reference regions using a variant of Smith-Waterman-Gotoh alignment (\cite{smith1981,gotoh1982}) where a `jump' state is included which can only be entered from the match state for the first reference segment and only exits to the match or insert states of the second reference segment. The state transitions of this alignment scheme are shown in Figure \ref{fig:jumpstate}
353
+
\subsubsection{Contig alignment for large SVs} For large SV candidates spanning two distinct regions of the genome, the reference sequences are extracted from the two expected breakend regions, and the order and/or orientation of the references is adjusted such that if the candidate SV exists, the left-most segment of the SV contig should align to the first transformed reference region and the right-most contig segment should align to the second reference region. The contig is aligned across the two reference regions using a variant of Smith-Waterman-Gotoh alignment (\cite{smith1981,gotoh1982}) where a `jump' state is included which can only be entered from the match state for the first reference segment and only exits to the match or insert states of the second reference segment. The state transitions of this alignment scheme are shown in Figure \ref{fig:jumpstate}.
354
354
355
355
\begin{figure}[!tpb]
356
356
\centerline{
@@ -362,15 +362,15 @@ \subsubsection{Contig alignment for large SVs} For large SV candidates spanning
362
362
\label{fig:jumpstate}
363
363
\end{figure}
364
364
365
-
The alignment scores used for each reference segment are (2,-8,-12,-1) for match, mismatch, gap open and gap extend. Switching between insertion and deletion states is allowed at no cost. Scores to transition into and extend the 'jump' state are -24 and 0, respectively. The jump state is entered from any point in reference segment 1 and exits to any point in reference segment 2. The alignments resulting from this method are only used when a transition through the jump state occurs. In addition, each of the two alignment segments flanking the jump state are required to extend at least 30 bases with an alignment score no less than 75\% of the perfect match score for the flanking alignment segment. If more than one contig meets all quality criteria the contig with the highest alignment score is selected. When a contig and alignment meet all quality criteria, the reference orientation and ordering transformations applied before alignment are reversed to express the refined basepair-resolution structural variant candidate in standard reference genome coordinates.
365
+
The alignment scores used for each reference segment are (2,-8,-12,-1) for match, mismatch, gap open and gap extend. Switching between insertion and deletion states is allowed at no cost. Scores to transition into and extend the 'jump' state are -100 and 0, respectively. The jump state is entered from any point in reference segment 1 and exits to any point in reference segment 2. The alignments resulting from this method are only used when a transition through the jump state occurs. In addition, each of the two alignment segments flanking the jump state are required to extend at least 30 bases with an alignment score no less than 75\% of the perfect match score for the flanking alignment segment. If more than one contig meets all quality criteria, the contig with the highest alignment score is selected. When a contig and alignment meet all quality criteria, the reference orientation and ordering transformations applied before alignment are reversed to express the refined basepair-resolution structural variant candidate in standard reference genome coordinates.
366
366
367
367
368
368
\subsubsection{Contig alignment for complex region candidates}
369
-
Complex regions are segments of the genome targeted for assembly without a specific variant hypothesis. For this reason the problem of aligning contigs for these regions is somewhat more difficult than for specific large SV candidates, because a wide range of variant sizes are possible. This is reflected in the alignment procedure for complex region contigs, which are checked against two aligners optimized for large and small indels respectively.
369
+
Complex regions are segments of the genome targeted for assembly without a specific variant hypothesis. For this reason the problem of aligning contigs for these regions is somewhat more difficult than for specific large SV candidates, because a wide range of variant sizes are possible. This is reflected in the indel aligner that handles both small and large indels.
370
370
371
-
A contig is first aligned with the large indel aligner and only checked for small indels if no large indels are found. The structure of the large indel aligner is a variant on a standard affine-gap scheme, in which a second pair of delete and insert states are added for large indels. Alignment scores for standard alignment states are (2, -8, -18, -1) for match, mismatch, gap open, and gap extend. Open and extend scores for 'large' gaps are -24 and 0. Transitions are allowed between standard insertions and deletions but disallowed between the large indel states. Variants are only reported from the large indel aligner if an insertion of at least 80 bases or a deletion of at least 200 bases is found. The flanking alignment quality criteria described above for large SVs is also applied to filter out noisy alignments. To reduce false positive calls in repetitive regions an additional filter is applied to complex region candidates: the left and right segments of the contig flanking a candidate indel are checked for uniqueness in the local reference context. Contig alignments are filtered out if either of the two flanking contig segments can be aligned equally well to multiple locations within 500bp of the target reference region.
371
+
The indel aligner is a variant on a standard affine-gap scheme, in which a second pair of delete and insert states are added for large indels. Alignment scores for standard alignment states are (2, -8, -24, -1) for match, mismatch, gap open, and gap extend. Open and extend scores for 'large' gaps are -100 and 0. Transitions are allowed between standard insertions and deletions but disallowed between the large indel states.
372
372
373
-
If the large indel aligner fails to identify a candidate meeting the size and quality criteria above, the contig is used to search for smaller indels, this time using a conventional affine gap aligner with parameters: (2,-8,-12,0) for match, mismatch, gap open, gap extend. All indels larger than the minimum indel size are identified. For each indel, the flanking contig alignment quality and uniqueness checks described above are applied to filter likely false positives, and any remaining cases become small indel candidates.
373
+
All indels larger than the minimum indel size are identified by the indel aligner. For each indel, the flanking alignment quality criteria described above for large SVs is also applied to filter out noise alignments. To further reduce false positive calls in repetitive regions, an additional filter is applied to complex region candidates: the left and right segments of the contig flanking a candidate indel are checked for uniqueness in the local reference context. Contig alignments are filtered out if either of the two flanking contig segments can be aligned equally well to multiple locations within 500bp of the target reference region. Among contigs meeting all quality criteria, the ones with 'large' gaps are prioritized during contig selection. If there are more than one contig with 'large' gaps, or if all contigs have no 'large' gap, the contig with the highest alignment score is selected.
0 commit comments