-
Notifications
You must be signed in to change notification settings - Fork 49
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
748 changed files
with
42,105 additions
and
32,998 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,112 @@ | ||
Truvari API QuickStart | ||
====================== | ||
|
||
Truvari provides functionality to facilitate the comparison of structural variants (SVs) in VCF variant records. Developers can easily leverage this functionality by replacing calls to `pysam.VariantFile` with `truvari.VariantFile`. The `truvari.VariantFile` retains all `pysam` functionality. | ||
|
||
.. code-block:: python | ||
import truvari | ||
vcf = truvari.VariantFile("input.vcf.gz") | ||
for entry in vcf: | ||
# Access variant's INFO fields using pysam | ||
if 'SVTYPE' in entry.info and 'SVLEN' in entry.info: | ||
print(entry.info['SVTYPE'], entry.info['SVLEN']) | ||
# But these INFOs aren't always available. | ||
# Access type/size properties of variants using truvari | ||
print(entry.var_type(), entry.var_size()) | ||
# Access genotype (GT) | ||
# using pysam -> | ||
if 'SAMPLE' in entry.samples and 'GT' in entry.samples['SAMPLE']: | ||
print(entry.samples['SAMPLE']['GT']) | ||
# using truvari -> | ||
print(entry.gt('SAMPLE')) | ||
# Calculate variant's allele frequency with truvari | ||
print(entry.allele_freq_annos()) | ||
Details of all available functions are in :ref:`package documentation <variant_handling>`. | ||
|
||
Comparing Variants | ||
------------------ | ||
|
||
The `truvari.VariantRecord` simplifies comparing two VCF entries. | ||
|
||
.. code-block:: python | ||
# Given two `truvari.VariantRecords`, entry1 and entry2 | ||
match = entry1.match(entry2) | ||
print("Entries' Sequence Similarity:", match.seqsim) | ||
print("Entries' Size Similarity:", match.sizesim) | ||
print("Is the match above thresholds:", match.state) | ||
This returns a :ref:`truvari.MatchResult <match_result>`. You can customize matching thresholds by providing :ref:`truvari.VariantParams <variant_params>` to the `truvari.VariantFile`. | ||
|
||
.. code-block:: python | ||
# Disable sequence and size similarity; enable reciprocal overlap | ||
p = truvari.VariantParams(pctseq=0, pctsize=0, pctovl=0.5) | ||
vcf = truvari.VariantFile("input.vcf.gz", params=p) | ||
entry1 = next(vcf) | ||
entry2 = next(vcf) | ||
match = entry1.match(entry2) | ||
Filtering Variants | ||
------------------ | ||
|
||
The `truvari.VariantParams` provides parameters for filtering variants, such as minimum or maximum SV sizes. | ||
|
||
.. code-block:: python | ||
p = truvari.VariantParams(sizemin=200, sizemax=500) | ||
vcf = truvari.VariantFile("input.vcf.gz", params=p) | ||
# Retrieve all variant records within sizemin and sizemax | ||
results = [entry for entry in vcf if not entry.filter_size()] | ||
Additional filters, such as excluding monomorphic reference sites or single-end BNDs, can be applied using `entry.filter_call()`. | ||
|
||
Subsetting to Regions | ||
--------------------- | ||
|
||
To subset a VCF to regions specified in a BED file, use: | ||
|
||
.. code-block:: python | ||
for entry in vcf.fetch_bed("regions.bed"): | ||
print("Entry's variant type:", entry.var_type()) | ||
print("Entry's variant size:", entry.var_size()) | ||
If your regions of interest are stored in an in-memory object instead of a BED file, use the `.fetch_regions` method: | ||
|
||
.. code-block:: python | ||
from collections import defaultdict | ||
from pyintervaltree import IntervalTree | ||
tree = defaultdict(IntervalTree) | ||
tree['chr1'].addi(10, 100) | ||
tree['chr2'].addi(2000, 2200) | ||
count = 0 | ||
for entry in vcf.fetch_regions(tree): | ||
count += 1 | ||
print(f"Total of {count} variants") | ||
To iterate over variants that are not within the regions, use `vcf.fetch_regions(tree, inside=False)`. Both of these | ||
fetch methods use heuristics to choose the more efficient fetching strategy of either seeking through the VCF file or | ||
streaming the entire file. | ||
|
||
Parsing BND Information | ||
----------------------- | ||
|
||
Truvari also simplifies parsing BND information from VCF entries: | ||
|
||
.. code-block:: python | ||
# Example entry: | ||
# chr1 23272628 SV_1 G G]chr5:52747359] . PASS SVTYPE=BND;EVENTTYPE=TRA:UNBALANCED;SUBCLONAL=n;COMPLEX=n;MATEID=SV_171 GT:PSL:PSO 0/1:.:. | ||
print(entry.bnd_position()) | ||
# ('chr5', 52747359) | ||
print(entry.bnd_direction_strand()) | ||
# ('right', 'direct') | ||
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,164 @@ | ||
Truvari package | ||
=============== | ||
|
||
Overview | ||
-------- | ||
.. automodule:: truvari | ||
:members: | ||
:undoc-members: | ||
:show-inheritance: | ||
|
||
.. _variant_handling: | ||
|
||
Variant Handling | ||
---------------- | ||
|
||
.. autoclass:: VariantFile | ||
:members: | ||
|
||
.. autoclass:: VariantRecord | ||
:members: | ||
|
||
.. _variant_params: | ||
|
||
.. autoclass:: VariantParams | ||
:members: | ||
|
||
Objects | ||
------- | ||
|
||
.. _match_result: | ||
|
||
.. autoclass:: MatchResult | ||
:members: | ||
|
||
.. autoclass:: GT | ||
:members: | ||
|
||
.. autoclass:: SV | ||
:members: | ||
|
||
.. autoclass:: Bench | ||
:members: | ||
|
||
.. autoclass:: BenchOutput | ||
:members: | ||
|
||
.. autoclass:: StatsBox | ||
:members: | ||
|
||
.. autoclass:: LogFileStderr | ||
:members: | ||
|
||
Extra Methods | ||
------------- | ||
.. autofunction:: bed_ranges | ||
|
||
.. autofunction:: benchdir_count_entries | ||
|
||
.. autofunction:: best_seqsim | ||
|
||
.. autofunction:: build_region_tree | ||
|
||
.. autofunction:: read_bed_tree | ||
|
||
.. autofunction:: check_vcf_index | ||
|
||
.. autofunction:: chunker | ||
|
||
.. autofunction:: cmd_exe | ||
|
||
.. autofunction:: compress_index_vcf | ||
|
||
.. autofunction:: coords_within | ||
|
||
.. autofunction:: count_entries | ||
|
||
.. autofunction:: extend_region_tree | ||
|
||
.. autofunction:: file_zipper | ||
|
||
.. autofunction:: help_unknown_cmd | ||
|
||
.. autofunction:: get_gt | ||
|
||
.. autofunction:: get_scalebin | ||
|
||
.. autofunction:: get_sizebin | ||
|
||
.. autofunction:: get_svtype | ||
|
||
.. autofunction:: make_temp_filename | ||
|
||
.. autofunction:: merge_region_tree_overlaps | ||
|
||
.. autofunction:: msa2vcf | ||
|
||
.. autofunction:: opt_gz_open | ||
|
||
.. autofunction:: optimize_df_memory | ||
|
||
.. autofunction:: overlap_percent | ||
|
||
.. autofunction:: overlaps | ||
|
||
.. autofunction:: performance_metrics | ||
|
||
.. autofunction:: phab | ||
|
||
.. autofunction:: reciprocal_overlap | ||
|
||
.. autofunction:: restricted_float | ||
|
||
.. autofunction:: restricted_int | ||
|
||
.. autofunction:: ref_ranges | ||
|
||
.. autofunction:: roll_seqsim | ||
|
||
.. autofunction:: seqsim | ||
|
||
.. autofunction:: setup_logging | ||
|
||
.. autofunction:: sizesim | ||
|
||
.. autofunction:: unroll_seqsim | ||
|
||
.. autofunction:: vcf_ranges | ||
|
||
.. autofunction:: vcf_to_df | ||
|
||
Data | ||
---- | ||
HEADERMAT | ||
^^^^^^^^^ | ||
regular expression of vcf header INFO/FORMAT fields with groups | ||
|
||
.. autodata:: HEADERMAT | ||
|
||
|
||
QUALBINS | ||
^^^^^^^^ | ||
0-100 quality score bin strings (step size 10) | ||
|
||
.. autodata:: QUALBINS | ||
|
||
SVTYTYPE | ||
^^^^^^^^ | ||
:class:`pandas.CategoricalDtype` of :class:`truvari.SV` | ||
|
||
SZBINMAX | ||
^^^^^^^^ | ||
integer list of maximum size for size bins | ||
|
||
.. autodata:: SZBINMAX | ||
|
||
SZBINS | ||
^^^^^^ | ||
string list of size bins | ||
|
||
.. autodata:: SZBINS | ||
|
||
SZBINTYPE | ||
^^^^^^^^^ | ||
:class:`pandas.CategoricalDtype` of :data:`truvari.SZBINS` |
Oops, something went wrong.