Skip to content

an analysis workflow to generate annotated peptide sequence from proteomic spectra using containerized tools.

License

Notifications You must be signed in to change notification settings

NCBI-Hackathons/Peptides-SpecTra-Annotations-PaSTA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

58 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Peptides SpecTra Annotations (PaSTA)

Background

Unlike the genomics field, currently most tools/workflows for analyzing proteomics data are either tied to a specific platform, such as Galaxy, or an operating system (OS), such as Microsoft Windows or Linux. This lack of publicly available, non-platform/OS-dependent and reusable proteomics tools and workflows is preventing valuable public proteomic datasets, such as those in NCI’s Proteomic Data Commons, to be analyzed. This proposal is to create an analysis workflow to generate annotated peptide sequence from proteomic spectra using containerized tools.

Challenges in the field

TBD

Workflow

Peptides SpecTra Annotations (PaSTA)

Prerequisite

Installation

Download the git repo

git clone https://github.com/NCBI-Hackathons/Peptides-SpecTra-Annotations-PaSTA
cd Peptides-SpecTra-Annotations-PaSTA

Download the dataset used

wget --recursive --no-parent --reject="index.html*" -e robots=off https://cptc-xfer.uis.georgetown.edu/publicData/Phase_II_Data/TCGA_Colorectal_Cancer_S_022/TCGA-A6-3807-01A-22_Proteome_VU_20121019/TCGA-A6-3807-01A-22_Proteome_VU_20121019_mzML/
gzip -d cptc-xfer.uis.georgetown.edu/publicData/Phase_II_Data/TCGA_Colorectal_Cancer_S_022/TCGA-A6-3807-01A-22_Proteome_VU_20121019/TCGA-A6-3807-01A-22_Proteome_VU_20121019_mzML/TCGA-A6-3807-01A-22_W_VU_201210*.mzML.gz

Download Human reference proteome database from UniProt

wget -O AUP000005640_sp.fasta "https://www.uniprot.org/uniprot/?query=reviewed:yes+AND+proteome:UP000005640&format=fasta"

Install MSGFPlus

wget https://github.com/MSGFPlus/msgfplus/releases/download/v2018.07.17/v2018.07.17.zip
mkdir -p software/MSGFPlus
unzip -d software/MSGFPlus v2018.07.17.zip
rm v2018.07.17.zip

Install Percolator

wget https://github.com/percolator/percolator/releases/download/rel-3-02-01/ubuntu64.tar.gz
tar -xvzf ubuntu64.tar.gz
sudo dpkg -i *.deb
sudo apt-get install -f

Install Mimic

wget https://github.com/percolator/mimic/archive/rel-1-00.zip
unzip rel-1-00.zip
cd mimic-rel-1-00
cmake -DCMAKE_INSTALL_PREFIX=$(pwd)/../software/ src/ && make && make install
cd ..
rm -rf mimic-rel-1-00 rel-1-00.zip

Run the whole pipeline

bash examples/workflow_mimic_msgf_percolator.sh
bash examples/run_one_example.sh

Default Mods.txt can be found in: software/MSGFPlus/doc/examples Instruction for adding custom modifications is also available in Mods.txt.

Workflow availability on the NCI Cloud Resources

A proof-of-concept of this workflow has been created on the Seven Bridges Cancer Genomics Cloud using Rabix Composer in Common Workflow Language Version 1.

Schematic of the Workflow on the Seven Bridges Cancer Genomics Cloud

Docker Instructions

A Docker image for the tools in the workflow is avialable here. The image includes all the prerequisites and dependencies.
To run the Docker image -

docker run -v `pwd`:`pwd` -w `pwd` -i -t stevetsa/proteomics:latest

This mounts the current working directory to the same directory structure inside the container. You will be able to access all files and folders downstream of the current working directory.
All the executibles are in /usr/bin, /usr/local/bin, MS-GF+ JAR file is /opt/MSGFPlusv2018.07.17.jar

Presentations

Resources

Future development

  • Downstream analysis: like meme-suite
  • Run the whole pipeline in a Docker image

About

an analysis workflow to generate annotated peptide sequence from proteomic spectra using containerized tools.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published