Skip to content

Latest commit

 

History

History
89 lines (61 loc) · 5.36 KB

README.md

File metadata and controls

89 lines (61 loc) · 5.36 KB

GBS SNP Calling Reference Optional Pipeline (GBS-SNP-CROP)

###Introduction The GBS SNP Calling Reference Optional Pipeline (GBS-SNP-CROP) is executed via a sequence of seven Perl scripts that integrate custom parsing and filtering procedures with well-known, vetted bioinformatic tools, giving the user full access to all intermediate files. By employing a novel strategy of SNP calling based on the correspondence of within-individual to across-population patterns of polymorphism, the pipeline is able to identify and distinguish high-confidence SNPs from both sequencing and PCR errors. The pipeline adopts a clustering strategy to build a population-tailored "Mock Reference" using the same GBS data for downstream SNP calling and genotyping. Designed for libraries of either paired-end (PE) or single-end (SE) reads of arbitrary lengths, GBS-SNP-CROP maximizes data usage by eliminating unnecessary data culling due to imposed length uniformity requirements. GBS-SNP-CROP is a complete bioinformatics pipeline developed primarily to support curation, research, and breeding programs wishing to utilize GBS for the cost-effective genome-wide characterization of plant genetic resources, mainly in the absence of a reference genome. The pipeline, however, can also be used when a reference genome is available, either as a standalone analysis or as a complement to reference-based analyses via alternative pipelines (e.g. TASSEL-GBS) or indeed its own reference-independent analysis.

Important Notes

New Version v.2.1 (03/08/2017)

A new version 2.1 will be released within few weeks accomplishing an indel functionality, allowing both SNPs and Indels calls.

Additional Trimmomatic flag recommended and error fix (24/02/2017)

GBS-SNP-CROP users, please make note of the following important announcements:

ERROR FIX: We have discovered a minor genotyping error in Script 7 (Line 216), resulting in the incorrect genotyping of secondary allele homozygotes as primary allele homozygotes, specifically in the case where secondary allele read depth is high and primary allele read depth = 1. This error affects <1% of genotyping calls in our test data. The error has now been corrected, and all users should replace their Script 7 with the version available as of this date (22/02/17). We sincerely apologize for this!

ADDITIONAL TRIMMOMATIC FLAG RECOMMENDED: It has come to our attention that Trimmomatic, by default, can discard high-quality R2 reads if they contain any adapter sequence (e.g. when a GBS fragment length is less than the Illumina read length). To avoid this unnecessary creation of singletons, and thus data loss to downstream scripts, we recommend that users activate the "keepBothReads" option within the Trimmomatic ILLUMINACLIP parameter, when using GBS-SNP-CROP to analyze paired-end (PE) data. For example, the current recommended Trimmomatic ILLUMINACLIP parameters for Script 2 are:

-ad TruSeq3-PE.fa:2:30:10:8:true

Please refer to the Trimmomatic user manual for more details.

Version 2.0 (5/11/2016) - Updated on 22/02/2017

The GBS-SNP-CROP v.2.0 was realized. Please access the realized version for more information.

Version 1.1 (3/11/2016)

The GBS-SNP-CROP v.1.1 was realized. Please access the realized version for more information.

Version 1.0 (1/12/2016)

This is the original version of the GBS-SNP-CROP pipeline.

Pipeline workflow

  • Stage 1. Process the raw GBS data

Step 1: Parse the raw reads
Step 2: Trim based on quality
Step 3: Demultiplex

  • Stage 2. Build the Mock Reference

Step 4: Cluster reads and assemble the Mock Reference

  • Stage 3. Map the processed reads and generate standardized alignment files

Step 5: Align with BWA-mem and process with SAMtools
Step 6: Parse mpileup output and produce the SNP discovery master matrix

  • Stage 4. Call SNPs and Genotypes

Step 7: Filter SNPs and call genotypes

User Manual

For more details, please see the GBS-SNP-CROP User Manual.

Discussion Forum

Follow this link to access the GBS-SNP-CROP Google Group.

Requirements

Citing GBS-SNP-CROP

Melo et al. GBS-SNP-CROP: A reference-optional pipeline for SNP discovery and plant germplasm characterization using genotyping-by-sequencing data. BMC Bioinformatics. 2016. 17:29. DOI 10.1186/s12859-016-0879-y.