- Pulls in vcf file and makes a class called
VCBerry. - VCBerry generates the following pandas DataFrames:
VCBerry.snpsare only SNPs.VCBerry.indelsare only INDELs.VCBerry.monomericare invariant sites.
- Call on VCBerry in a module via:
new_df = VCBerry(vcfile)function_return = new_df.snps
- Other attributes of a VCBerry object:
VCBerry.allvarsoutputs a combined snp/indel DataFrame.VCBerry.headerprovides the original vcf header for compiling a new vcf.
- Papaya is a Jupyter notebook that visualizes the position of varaints on a chromosome.
- The frequency of variants at binned positions along the chromosome is visualized with a Manhattan plot and a heat map.
- Strawberry also plots the nucleotide change frequencies across all variants.
- Jackfruit uses the JASPAR databse to identify variants within transcription factor binding sites.
- Jackfruit takes in a VCBerry.snp database and outputs a dictionary of reference and alternate TF binding motifs.
- The INDEL version of Jackfruit is in progress, named
Durian.py - ** Ideal output would be a .tsv
- Strawberry extracts the genotype of every individual for every variant.
- Strawberry then calculates the ***
- Annotates variants with the reference annotation.
- Grapes provides variant position in a chromosome.
- WaterMelon annotates variants with the reference annotated genome.
- LycheeMelon outputs reference and alternate sequence.
- Lychee takes the annotated reference and alternate sequences and predicts the effect of the variant.
- Lychee maps codons to amino acids and determines synonymous versus nonsynonymous mutations based on amino acid properties.
- Lychee outputs the translated sequences and associated properties of the variants. **txt file?
- Fruit Basket integrates all the VCFruit modules with the exception of Papaya.
- The output is a series of files that describe, analyze, and annotate a contributed VCF.