Skip to content

Commit 90f9962

Browse files
author
Tim Bergquist
committedFeb 13, 2019
parallelization enabled
1 parent e149700 commit 90f9962

9 files changed

+100601
-72
lines changed
 

‎README.md

+3-3
Original file line numberDiff line numberDiff line change
@@ -62,12 +62,12 @@ bash install.sh
6262
[MutPredLOF](http://mutpredlof.cs.indiana.edu/#dload)<br>
6363
[MutPred-Indel](http://mutpredindel.cs.indiana.edu/#dload)
6464

65-
Put them into the [tools](/tools) directory and use *tar -xvzf* to unzip the tarballs.
65+
Put them into the [tools](/tools) directory and use *tar -xvzf* to unzip the tarballs. You will need to find and download the MATLAB MCR in order to run MutPredLOF and MutPred-Indel. MutPred2 comes with a MATLAB MCR so copying that folder into the other two is an option. Just pointing to the MutPred2 MCR directory creates path issues.
6666

6767
## Annovar
68-
Do to licensing issues, we can't include Annovar in the source code. Go to [Annovar](http://annovar.openbioinformatics.org/en/latest/user-guide/download/) and fill out the form to receive a link to download the tool. Add the package to the [tools](/tools) folder.
68+
Due to licensing issues, we can't include Annovar in the source code. Go to [Annovar](http://annovar.openbioinformatics.org/en/latest/user-guide/download/) and fill out the form to receive a link to download the tool. Add the package to the [tools](/tools) folder.
6969

70-
Go to the main [annovar directory](/tools/annovar/) and run the command:
70+
Go to the main annovar directory (/tools/annovar/) and run the command:
7171
```
7272
perl annotate_variation.pl -downdb -buildver hg19 -webfrom annovar refGene humandb/
7373
```

‎Snakefile

+48-40
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11

2-
2+
configfile: "config.json"
33
# want to get this from the command line, or a directory
44

55

@@ -17,23 +17,21 @@ BASE = "small_sample"
1717

1818
# wildcars
1919
VARTYPES = ["missense", "LOF", "indels"]
20-
20+
ALL_THREADS = [num for num in range(config["num_threads"])]
21+
NUM_THREADS = max(ALL_THREADS) + 1
2122

2223
# final output is the input
2324
# for glob_wildcard, will likely need an expand here
25+
2426
rule all:
2527
input:
26-
MAIN_DIR + "intermediates/annovar/" + BASE + ".full.avinput",
27-
MAIN_DIR + "intermediates/annovar/" + BASE + ".exonic_variant_function",
28-
expand(MAIN_DIR + "intermediates/splits/" + BASE + ".{vartype}_0.exonic_variant_function", vartype=VARTYPES),
29-
expand(MAIN_DIR + "intermediates/faa/" + BASE + ".{vartype}_0.faa", vartype=VARTYPES),
30-
MAIN_DIR + "intermediates/scores/" + BASE + ".missense_0.csv",
31-
MAIN_DIR + "intermediates/scores/" + BASE + ".LOF_0_output.txt",
32-
MAIN_DIR + "intermediates/scores/" + BASE + ".indels_0_output.txt",
33-
MAIN_DIR + "data/" + BASE + ".vcf.tmp"
28+
expand(MAIN_DIR + "intermediates/faa/" + BASE + ".{vartype}_{num_threads}.faa", vartype=VARTYPES, num_threads=ALL_THREADS),
29+
expand(MAIN_DIR + "intermediates/scores/" + BASE + ".{vartype}_{num_threads}_output.txt", vartype=VARTYPES, num_threads=ALL_THREADS),
30+
MAIN_DIR + "data/" + BASE + ".annotated.vcf",
31+
MAIN_DIR + "data/" + BASE + ".scored.vcf"
3432

3533

36-
ruleorder: annovar_convert > annovar_annotate > splitter > coding_change > MutPred2 > MutPred_LOF > MutPred_indel > merge
34+
ruleorder: annovar_convert > annovar_annotate > splitter > coding_change > MutPred2 > MutPred_LOF > MutPred_indel > Merge
3735

3836
# first run annovar - there are two steps
3937
rule annovar_convert:
@@ -67,61 +65,71 @@ rule splitter:
6765
input:
6866
rules.annovar_annotate.output.var_fxn
6967
output:
70-
splits=MAIN_DIR + "intermediates/splits/" + BASE + ".{vartype}_0.exonic_variant_function"
68+
expand(MAIN_DIR + "intermediates/splits/" + BASE + ".{vartype}_{num_threads}.exonic_variant_function", vartype=VARTYPES, num_threads=ALL_THREADS)
69+
threads:
70+
NUM_THREADS
7171
shell:
72-
"{params.cmd} --target {input} --output {params.output_folder}"
72+
"{params.cmd} -threads " + str(NUM_THREADS) + " --target {input} --output {params.output_folder}"
7373

7474
rule coding_change:
75-
params:
76-
cmd="tools/annovar/coding_change.pl",
77-
ops="-includesnp",
78-
refGeneMrna="tools/annovar/humandb/hg19_refGeneMrna.fa",
79-
refGene="tools/annovar/humandb/hg19_refGene.txt"
80-
input:
81-
rules.splitter.output.splits
82-
output:
83-
faa_file=MAIN_DIR + "intermediates/faa/" + BASE + ".{vartype}_0.faa"
84-
shell:
85-
"{params.cmd} {params.ops} {input} {params.refGene} {params.refGeneMrna} > {output}"
75+
params:
76+
cmd="perl tools/annovar/coding_change.pl",
77+
ops="-includesnp",
78+
refGeneMrna="tools/annovar/humandb/hg19_refGeneMrna.fa",
79+
refGene="tools/annovar/humandb/hg19_refGene.txt"
80+
input:
81+
MAIN_DIR + "intermediates/splits/" + BASE + ".{vartype}_{num_threads}.exonic_variant_function"
82+
output:
83+
MAIN_DIR + "intermediates/faa/" + BASE + ".{vartype}_{num_threads}.faa"
84+
threads:
85+
1
86+
shell:
87+
"{params.cmd} {input} {params.refGene} {params.refGeneMrna} {params.ops} > {output}"
8688

8789

8890
rule MutPred2:
8991
input:
90-
MAIN_DIR + "intermediates/faa/" + BASE + ".missense_0.faa"
92+
MAIN_DIR + "intermediates/faa/" + BASE + ".missense_{num_threads}.faa"
9193
output:
92-
MAIN_DIR + "intermediates/scores/" + BASE + ".missense_0.csv"
94+
MP2=MAIN_DIR + "intermediates/scores/" + BASE + ".missense_{num_threads}_output.txt"
95+
threads:
96+
2
9397
shell:
94-
"tools/mutpred2.0/run_mutpred2.sh -i {input} -p 1 -c 1 -b 0 -t 0.05 -f 2 -o {output}"
98+
"cd tools/mutpred2.0 && ./run_mutpred2.sh -i {input} -p 1 -c 1 -b 0 -t 0.05 -f 2 -o {output}"
9599

96100

97101
rule MutPred_LOF:
98102
params:
99-
outfile_prefix=MAIN_DIR + "intermediates/scores/" + BASE + ".LOF_0"
103+
outfile_prefix=MAIN_DIR + "intermediates/scores/" + BASE + ".LOF_{num_threads}"
100104
input:
101-
MAIN_DIR + "intermediates/faa/" + BASE + ".LOF_0.faa"
105+
MAIN_DIR + "intermediates/faa/" + BASE + ".LOF_{num_threads}.faa"
102106
output:
103-
MAIN_DIR + "intermediates/scores/" + BASE + ".LOF_0_output.txt"
107+
MPL=MAIN_DIR + "intermediates/scores/" + BASE + ".LOF_{num_threads}_output.txt"
108+
threads:
109+
6
104110
shell:
105111
"cd tools/MutPredLOF && ./run_MutPredLOF.sh v91/ {input} {params.outfile_prefix}"
106112

107113
rule MutPred_indel:
108114
params:
109-
outfile_prefix=MAIN_DIR + "intermediates/scores/" + BASE + ".indels_0"
115+
outfile_prefix=MAIN_DIR + "intermediates/scores/" + BASE + ".indels_{num_threads}"
110116
input:
111-
MAIN_DIR + "intermediates/faa/" + BASE + ".indels_0.faa"
117+
MAIN_DIR + "intermediates/faa/" + BASE + ".indels_{num_threads}.faa"
112118
output:
113-
MAIN_DIR + "intermediates/scores/" + BASE + ".indels_0_output.txt"
119+
MPI=MAIN_DIR + "intermediates/scores/" + BASE + ".indels_{num_threads}_output.txt"
120+
threads:
121+
6
114122
shell:
115123
"cd tools/MutPredIndel_compiled && ./run_MutPredIndel.sh v91/ {input} {params.outfile_prefix}"
116124

117-
118-
rule merge:
125+
rule Merge:
119126
input:
120-
MAIN_DIR + "intermediates/scores/" + BASE + ".LOF_0_output.txt",
121-
MAIN_DIR + "intermediates/scores/" + BASE + ".indels_0_output.txt",
122-
MAIN_DIR + "intermediates/scores/" + BASE + ".missense_0.csv"
127+
expand(MAIN_DIR + "intermediates/scores/" + BASE + ".{vartype}_{num_threads}_output.txt", vartype=VARTYPES, num_threads=ALL_THREADS)
123128
output:
124-
MAIN_DIR + "data/" + BASE + ".vcf.tmp"
129+
MAIN_DIR + "data/" + BASE + ".annotated.vcf",
130+
MAIN_DIR + "data/" + BASE + ".scored.vcf"
131+
threads:
132+
NUM_THREADS
125133
shell:
126-
"python mutpred_merge.py --base " + BASE
134+
"python mutpred_merge.py --vcf " + VCFFILE
127135

‎config.json

+3
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
{
2+
"num_threads": 5
3+
}

0 commit comments

Comments
 (0)