Josh Bauman rotation spring 2025 - wrote bash scripts to filter GATK and PAV VCFs, R scripts for visualizing this variation, and worked on validating BRAKER gene model calls between N2 and WS using BLASTP
This script is necessary as part of the PAV_INDEL_vis_josh.R
script to prune out HDRs from the SV plot.
This script plots all the PAV results from long read assemblies (from pavmergetrim.sh
) faceted by chromosome. HDRs are pruned with pav_hdr_intersect.sh
Merges all PAV results when run in the same directory that contains all of the PAV output vcfs
Same as pavmerge_trim.sh
, but includes inversions
converts the output of pavmerge2 into a format that can be read by PAV_INDEL_vis_josh.R
Rscript that takes the output of vcf_filtering
and creates the variant plot, annotated SNV plot, and data tables. It is run internally by vcf_filtering.sh
the main function to run to generate plot and variant data tables. Arguments are made in this format (glc-1 locus in this case): sbatch vcf_filtering.sh c_elegans V 16115967 16276907 CB4856
This script takes the .Rdata
file from variantvis_josh_strainspecific
and creates a markdown report with the variant plot and variant tables. This was done as a workaround because I was unable to knit an html on the Rstudio server. Ideally this script would be revised and all the code from variantvis_josh_strainspecific.R
would be run in the first markdown cell. That would require this markdown doc to inherit arguments from the vcf_filtering.sh
script, which might be tricky.
Concatenates the BLASTp results for all the wi->N2 protein blasts and the N2->wi protein blasts, and adds a column for strain
Performs the blast search for each wi protein fasta against the N2 database (and the N2 fasta agains each wi database). Results go into blastcat.sh
performs BLASTx against each wi protein fasta against the N2 genome
Makes blast databases for each wi protein fasta
A rudimentary script for performing RBBH on CB4856 protein fast -> N2 protein fasta and reciprocal blast. Needs to be modularized to work on any strain, and needs to be updated to account for cases where the RBBH proteins are not always the highest hits sorted by ascending Evalue and descending bitscore.