GBS SNP Calling Reference Optional Pipeline (GBS-SNP-CROP)

###Introduction The GBS SNP Calling Reference Optional Pipeline (GBS-SNP-CROP) is executed via a sequence of seven Perl scripts that integrate custom parsing and filtering procedures with well-known, vetted bioinformatic tools, giving the user full access to all intermediate files. By employing a novel strategy of SNP calling based on the correspondence of within-individual to across-population patterns of polymorphism, the pipeline is able to identify and distinguish high-confidence SNPs from both sequencing and PCR errors. The pipeline adopts a clustering strategy to build a population-tailored "Mock Reference" using the same GBS data for downstream SNP calling and genotyping. Designed for libraries of either paired-end (PE) or single-end (SE) reads of arbitrary lengths, GBS-SNP-CROP maximizes data usage by eliminating unnecessary data culling due to imposed length uniformity requirements. GBS-SNP-CROP is a complete bioinformatics pipeline developed primarily to support curation, research, and breeding programs wishing to utilize GBS for the cost-effective genome-wide characterization of plant genetic resources, mainly in the absence of a reference genome. The pipeline, however, can also be used when a reference genome is available, either as a standalone analysis or as a complement to reference-based analyses via alternative pipelines (e.g. TASSEL-GBS) or indeed its own reference-independent analysis.

Important Notes

New Version v.2.1 (03/08/2017)

A new version 2.1 will be released within few weeks accomplishing an indel functionality, allowing both SNPs and Indels calls.

Additional Trimmomatic flag recommended and error fix (24/02/2017)

GBS-SNP-CROP users, please make note of the following important announcements:

ERROR FIX: We have discovered a minor genotyping error in Script 7 (Line 216), resulting in the incorrect genotyping of secondary allele homozygotes as primary allele homozygotes, specifically in the case where secondary allele read depth is high and primary allele read depth = 1. This error affects <1% of genotyping calls in our test data. The error has now been corrected, and all users should replace their Script 7 with the version available as of this date (22/02/17). We sincerely apologize for this!

ADDITIONAL TRIMMOMATIC FLAG RECOMMENDED: It has come to our attention that Trimmomatic, by default, can discard high-quality R2 reads if they contain any adapter sequence (e.g. when a GBS fragment length is less than the Illumina read length). To avoid this unnecessary creation of singletons, and thus data loss to downstream scripts, we recommend that users activate the "keepBothReads" option within the Trimmomatic ILLUMINACLIP parameter, when using GBS-SNP-CROP to analyze paired-end (PE) data. For example, the current recommended Trimmomatic ILLUMINACLIP parameters for Script 2 are:

-ad TruSeq3-PE.fa:2:30:10:8:true

Please refer to the Trimmomatic user manual for more details.

Version 2.0 (5/11/2016) - Updated on 22/02/2017

The GBS-SNP-CROP v.2.0 was realized. Please access the realized version for more information.

Version 1.1 (3/11/2016)

The GBS-SNP-CROP v.1.1 was realized. Please access the realized version for more information.

Version 1.0 (1/12/2016)

This is the original version of the GBS-SNP-CROP pipeline.

Pipeline workflow

Stage 1. Process the raw GBS data

Step 1: Parse the raw reads
Step 2: Trim based on quality
Step 3: Demultiplex

Stage 2. Build the Mock Reference

Step 4: Cluster reads and assemble the Mock Reference

Stage 3. Map the processed reads and generate standardized alignment files

Step 5: Align with BWA-mem and process with SAMtools
Step 6: Parse mpileup output and produce the SNP discovery master matrix

Stage 4. Call SNPs and Genotypes

Step 7: Filter SNPs and call genotypes

User Manual

For more details, please see the GBS-SNP-CROP User Manual.

Discussion Forum

Follow this link to access the GBS-SNP-CROP Google Group.

Requirements

Java 7 or higher - We used Java 8
Trimmomatic v.0.33 (Bolger et al., 2014)
PEAR v.0.96 (Zhang et al., 2014)
Usearch v.8.0.1623 (Edgar, 2010)
BWA aligner v.0.7.12 (Li & Durbin, 2009)
SAMTools v.1.2 (Li et al., 2009)

Citing GBS-SNP-CROP

Melo et al. GBS-SNP-CROP: A reference-optional pipeline for SNP discovery and plant germplasm characterization using genotyping-by-sequencing data. BMC Bioinformatics. 2016. 17:29. DOI 10.1186/s12859-016-0879-y.

Name		Name	Last commit message	Last commit date
Latest commit History 245 Commits
GBS-SNP-CROP-scripts		GBS-SNP-CROP-scripts
tutorial		tutorial
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GBS SNP Calling Reference Optional Pipeline (GBS-SNP-CROP)

Important Notes

Pipeline workflow

User Manual

Discussion Forum

Requirements

Citing GBS-SNP-CROP

About

Releases

Packages

Languages

License

josecarballo87/GBS-SNP-CROP

Folders and files

Latest commit

History

Repository files navigation

GBS SNP Calling Reference Optional Pipeline (GBS-SNP-CROP)

Important Notes

Pipeline workflow

User Manual

Discussion Forum

Requirements

Citing GBS-SNP-CROP

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages