Skip to content
This repository was archived by the owner on Jul 17, 2023. It is now read-only.

Commit 261e6a6

Browse files
committed
Merge branch 'develop' into feature-MANTA-1398
2 parents 296ad36 + 8df4f99 commit 261e6a6

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

57 files changed

+991
-556
lines changed

.travis.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11

22
#
33
# Using sudo-false/container-based tests for greater (linux) test responsiveness. This doesn't seem
4-
# to effect the queing time for OSX tests.
4+
# to effect the queueing time for OSX tests.
55
#
66

77
dist: trusty

CHANGELOG.md

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,27 @@
44
- Add a configurable option to allow overlapping pairs to be used as evidence (MANTA-1398)
55
- The option is available in the configure file configureManta.py.ini
66

7+
### Changed
8+
- Change SV candidate contig aligners to improve precision (MANTA-1396)
9+
- Change contig aligners such that variant occurrences are more heavily penalized.
10+
- Fix multi-junction nomination (MANTA-1430)
11+
- Complex events with more than two junctions are no longer nominated as a group
12+
- Fix the problem of duplicate detection of the same SV candidate
13+
- Add index to ensure uniqueness of evidence bam filenames (MANTA-1431)
14+
- It solves the potential problem of name conflicts for evidence bams if the input bam files have the same name while located in different directories.
15+
- Change filters for easy interpretation of multi-sample germline variant vcf (MANTA-1343)
16+
- Add record-level filter 'SampleFT' when no sample passes all sample level filters
17+
- Add sample-level filter 'HomRef' for homogyzous reference calls
18+
- No more sample-level filter will be applied at the record level even if it applies to all samples
19+
- Change representation of inversions in the VCF output (MANTA-1385)
20+
- Intrachromosomal translocations with inverted breakpoints are now reported as two breakend (BND) records.
21+
- Previously they were reported in the VCF using the inversion (INV) allele type.
22+
23+
### Fixed
24+
- Fix the bug of stats generation with short reference sequences (MANTA-1459/[#143])
25+
- Fix the evidence significance test in the multi-sample calling mode (MANTA-1294)
26+
- This issue previously caused spurious false negatives during the multi-sample calling mode. The incidence rate of the problem tended to increase with sample count.
27+
728
## v1.4.0 - 2018-04-25
829

930
This is a major bugfix update from v1.3.2, featuring improved precision and vcf representation, in addition to minor user friendly improvements.

README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -31,8 +31,8 @@ indels for germline and cancer sequencing applications. *Bioinformatics*,
3131

3232
...and the corresponding [open-access pre-print][preprint].
3333

34-
[bpaper]:https://dx.doi.org/10.1093/bioinformatics/btv710
35-
[preprint]:http://dx.doi.org/10.1101/024232
34+
[bpaper]:https://doi.org/10.1093/bioinformatics/btv710
35+
[preprint]:https://doi.org/10.1101/024232
3636

3737

3838
License

docs/developerGuide/README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@ Manta Developer Guide
2424
* [Commit messages](#commit-messages)
2525
* [Commit consolidation](#commit-consolidation)
2626
* [Changelog conventions](#changelog-conventions)
27-
* [Branching and release tagging guidelines](#branching-and-release-tagging-guidelines)
27+
* [Branching and release tagging guidelines](#branching-and-release-tagging-guidelines)
2828
* [Error handling](#error-handling)
2929
* [General Policies](#general-policies)
3030
* [Exception Details](#exception-details)
@@ -240,7 +240,7 @@ prior to merging the branch.
240240
longer, for instance by starting all major bullet points with an imperitive verb.
241241

242242

243-
## Branching and release tagging guidelines
243+
### Branching and release tagging guidelines
244244

245245
All features and bugfixes are developed on separate branches. Branch names should contain the corresponding JIRA ticket
246246
id or contain the key "github${issueNumber}' to refer to the corresponding issue on github.com. After code

docs/methods/primary/methods.tex

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -350,7 +350,7 @@ \subsubsection{Contig assembly}
350350

351351
Finally, a greedy procedure is applied to select the constructed contigs in the order of the number of effective supporting reads and contig length. An effective supporting read cannot be a psuedo read, nor support any contigs that have been selected previously. The selection process is repeated until there is no more contig available with the minimum number of effective supporting reads (defaults to 2), or the maximum number of assembled contigs (defaults to 10) is met.
352352

353-
\subsubsection{Contig alignment for large SVs} For large SV candidates spanning two distinct regions of the genome, the reference sequences are extracted from the two expected breakend regions, and the order and/or orientation of the references is adjusted such that if the candidate SV exists, the left-most segment of the SV contig should align to the first transformed reference region and the right-most contig segment should align to the second reference region. The contig is aligned across the two reference regions using a variant of Smith-Waterman-Gotoh alignment (\cite{smith1981,gotoh1982}) where a `jump' state is included which can only be entered from the match state for the first reference segment and only exits to the match or insert states of the second reference segment. The state transitions of this alignment scheme are shown in Figure \ref{fig:jumpstate}
353+
\subsubsection{Contig alignment for large SVs} For large SV candidates spanning two distinct regions of the genome, the reference sequences are extracted from the two expected breakend regions, and the order and/or orientation of the references is adjusted such that if the candidate SV exists, the left-most segment of the SV contig should align to the first transformed reference region and the right-most contig segment should align to the second reference region. The contig is aligned across the two reference regions using a variant of Smith-Waterman-Gotoh alignment (\cite{smith1981,gotoh1982}) where a `jump' state is included which can only be entered from the match state for the first reference segment and only exits to the match or insert states of the second reference segment. The state transitions of this alignment scheme are shown in Figure \ref{fig:jumpstate}.
354354

355355
\begin{figure}[!tpb]
356356
\centerline{
@@ -362,15 +362,15 @@ \subsubsection{Contig alignment for large SVs} For large SV candidates spanning
362362
\label{fig:jumpstate}
363363
\end{figure}
364364

365-
The alignment scores used for each reference segment are (2,-8,-12,-1) for match, mismatch, gap open and gap extend. Switching between insertion and deletion states is allowed at no cost. Scores to transition into and extend the 'jump' state are -24 and 0, respectively. The jump state is entered from any point in reference segment 1 and exits to any point in reference segment 2. The alignments resulting from this method are only used when a transition through the jump state occurs. In addition, each of the two alignment segments flanking the jump state are required to extend at least 30 bases with an alignment score no less than 75\% of the perfect match score for the flanking alignment segment. If more than one contig meets all quality criteria the contig with the highest alignment score is selected. When a contig and alignment meet all quality criteria, the reference orientation and ordering transformations applied before alignment are reversed to express the refined basepair-resolution structural variant candidate in standard reference genome coordinates.
365+
The alignment scores used for each reference segment are (2,-8,-12,-1) for match, mismatch, gap open and gap extend. Switching between insertion and deletion states is allowed at no cost. Scores to transition into and extend the 'jump' state are -100 and 0, respectively. The jump state is entered from any point in reference segment 1 and exits to any point in reference segment 2. The alignments resulting from this method are only used when a transition through the jump state occurs. In addition, each of the two alignment segments flanking the jump state are required to extend at least 30 bases with an alignment score no less than 75\% of the perfect match score for the flanking alignment segment. If more than one contig meets all quality criteria, the contig with the highest alignment score is selected. When a contig and alignment meet all quality criteria, the reference orientation and ordering transformations applied before alignment are reversed to express the refined basepair-resolution structural variant candidate in standard reference genome coordinates.
366366

367367

368368
\subsubsection{Contig alignment for complex region candidates}
369-
Complex regions are segments of the genome targeted for assembly without a specific variant hypothesis. For this reason the problem of aligning contigs for these regions is somewhat more difficult than for specific large SV candidates, because a wide range of variant sizes are possible. This is reflected in the alignment procedure for complex region contigs, which are checked against two aligners optimized for large and small indels respectively.
369+
Complex regions are segments of the genome targeted for assembly without a specific variant hypothesis. For this reason the problem of aligning contigs for these regions is somewhat more difficult than for specific large SV candidates, because a wide range of variant sizes are possible. This is reflected in the indel aligner that handles both small and large indels.
370370

371-
A contig is first aligned with the large indel aligner and only checked for small indels if no large indels are found. The structure of the large indel aligner is a variant on a standard affine-gap scheme, in which a second pair of delete and insert states are added for large indels. Alignment scores for standard alignment states are (2, -8, -18, -1) for match, mismatch, gap open, and gap extend. Open and extend scores for 'large' gaps are -24 and 0. Transitions are allowed between standard insertions and deletions but disallowed between the large indel states. Variants are only reported from the large indel aligner if an insertion of at least 80 bases or a deletion of at least 200 bases is found. The flanking alignment quality criteria described above for large SVs is also applied to filter out noisy alignments. To reduce false positive calls in repetitive regions an additional filter is applied to complex region candidates: the left and right segments of the contig flanking a candidate indel are checked for uniqueness in the local reference context. Contig alignments are filtered out if either of the two flanking contig segments can be aligned equally well to multiple locations within 500bp of the target reference region.
371+
The indel aligner is a variant on a standard affine-gap scheme, in which a second pair of delete and insert states are added for large indels. Alignment scores for standard alignment states are (2, -8, -24, -1) for match, mismatch, gap open, and gap extend. Open and extend scores for 'large' gaps are -100 and 0. Transitions are allowed between standard insertions and deletions but disallowed between the large indel states.
372372

373-
If the large indel aligner fails to identify a candidate meeting the size and quality criteria above, the contig is used to search for smaller indels, this time using a conventional affine gap aligner with parameters: (2,-8,-12,0) for match, mismatch, gap open, gap extend. All indels larger than the minimum indel size are identified. For each indel, the flanking contig alignment quality and uniqueness checks described above are applied to filter likely false positives, and any remaining cases become small indel candidates.
373+
All indels larger than the minimum indel size are identified by the indel aligner. For each indel, the flanking alignment quality criteria described above for large SVs is also applied to filter out noise alignments. To further reduce false positive calls in repetitive regions, an additional filter is applied to complex region candidates: the left and right segments of the contig flanking a candidate indel are checked for uniqueness in the local reference context. Contig alignments are filtered out if either of the two flanking contig segments can be aligned equally well to multiple locations within 500bp of the target reference region. Among contigs meeting all quality criteria, the ones with 'large' gaps are prioritized during contig selection. If there are more than one contig with 'large' gaps, or if all contigs have no 'large' gap, the contig with the highest alignment score is selected.
374374

375375
\subsubsection{Large Insertions}
376376

0 commit comments

Comments
 (0)