GitHub - Pinjontall94/asd-q2: qiime2 asd scripts

asd-q2 (Real name TBD)

This is a simple, snakemake-based pipeline that takes an NCBI accession list of a given SRA#, and performs de novo OTU clustering via qiime2's vsearch wrapper. End results are given in Qiime2 Artifacts (.qza files), though these can be extracted the same as any .zip file, if you so choose

Preparation, or "Before you run snakemake"

Clone this repository and create & activate a new conda environment with the provided environment file

git clone --depth 1 [email protected]:Pinjontall94/asd-q2.git /your/new/analysis/folder
mamba env create -f environment.yaml 
conda activate snakeqiimer

Note: the standard conda tool that comes with Anaconda will work, but as Snakemake itself recommends, I highly encourage you to use mamba (whether on its own, or via the mambaforge distribution)

Download the NCBI Accession List (e.g. "SRR_Acc_list.txt") and move it into the asd-q2 folder
Run the following in the asd-q2 folder:

python scripts/srr_munch.py -i SRR_Acc_List.txt -o data

Modify the config file ("config.yaml") to fit your analysis Update the following parameters, in plain text, unless otherwise specified:

"AUTHOR": a string containing no spaces (e.g. "Franklin_53")
"primers", "FWD" and "REV": integer values only (e.g. FWD: 5)
Optional: "offset", FWD or REV for 5' and 3' bp-wise offsets, respectively
Optional: "THREADS", specify the number of CPU threads to allocate to the pipeline (e.g. THREADS: 8)

Example config:

AUTHOR: "Franklin_53"

primers:
  FWD: GTGCCAGCMGCCGCGGTAA
  REV: ATTAGASACCCBDGTAGTCC

# Number of nucleotides to trim from reads' 5' (FWD) and 3' (REV) ends
offset:
  FWD: 5
  REV: 4

THREADS: 8

Optional: Visualize the pipeline

Note: Requires graphviz is installed

(snakeqiimer) /your/new/analysis/folder/asd-q2 $ snakemake --dag | dot -Tsvg > dag.svg

Run the pipeline

Locally / On your device:

Run with:

(snakeqiimer) /your/new/analysis/folder/asd-q2 $ snakemake -cN  # where N = number of cores

Your output files will be stored in a newly made "OTUs" folder

Pipeline Stages

Download and unzip all fastq.gz's listed in the accession list as SRR numbers, and place them in a "data" folder
Generate a Qiime2-compatible manifest file for the resulting fastqs Note: Only tested on PHRED33 fastqs
Import Seqs
Merge paired-end reads with q2-vsearch's join pairs
Dereplicate the SampleData[Sequences] artifact
De novo cluster FeatureTable[Frequency] and FeatureData[Sequence] artifacts
Generate FeatureTable and FeatureData summaries
Create a tree for phylogenetic diversity analyses
Determine alpha and beta diversity

TODO

Add conditional to handle all PHRED values compatible with Qiime2
Add rule for Qiime2 that uses the artifact api
Add examples folder showing sample workflows
Organize rules into a separate folder? (Maybe not necessary)
Add instructions for running remotely via slurm and/or GCP ()

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
scripts		scripts
.gitignore		.gitignore
README.md		README.md
Snakefile		Snakefile
environment.yaml		environment.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

asd-q2 (Real name TBD)

Preparation, or "Before you run snakemake"

Example config:

Optional: Visualize the pipeline

Run the pipeline

Locally / On your device:

Pipeline Stages

TODO

About

Releases

Packages

Languages

Pinjontall94/asd-q2

Folders and files

Latest commit

History

Repository files navigation

asd-q2 (Real name TBD)

Preparation, or "Before you run snakemake"

Example config:

Optional: Visualize the pipeline

Run the pipeline

Locally / On your device:

Pipeline Stages

TODO

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages