-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
format of input files: java.lang.NumberFormatException: For input string: #24
Comments
Piet; |
Hi Brad, Thanks for the very quick reply. Getting my feet wet with variant calling I have grep'ed every possible file for a quote followed by that number, but P On Mon, Jan 12, 2015 at 5:13 PM, Brad Chapman [email protected]
|
Piet; |
Hi Brad, Unfortunately I can't share the bed file, the data I am using does not Is there a specific bed format that is required, BED6 or BED12. My current Sorry about the inconvenience with the file sharing. Kind Regards, On Wed, Jan 14, 2015 at 6:10 AM, Brad Chapman [email protected]
|
Piet; |
Hi Brad, Solved the problem, parsed the bed file with a python script (removing INFO 13:52:38,856 HelpFormatter - Date/Time: 2015/01/15 13:52:38 INFO 13:52:38,856 HelpFormatter -INFO 13:52:38,856 HelpFormatter -INFO 13:52:39,702 GenomeAnalysisEngine - Strictness is SILENT A list of these samples: NA00001 To ignore these samples, run with On Wed, Jan 14, 2015 at 5:49 PM, Brad Chapman [email protected]
|
Piet; For your second problem, it looks like you used the example naming for the sample name in the input YAML ( https://github.com/chapmanb/bcbio.variation#configuration-file Hope this helps get you going and thanks again for all the help debugging this. |
Hi,
I am trying to compare a set of vcf files to a set of confirmed snps from a genome in a bottle database. I do not have access to the raw fastq file, so I am unsure regarding the filters applied to mapping. I merely have a set of bam files, vcf files a bed region file. I therefore also don't know what post mapping alteration have been performed.
I have have tried to run:
java -jar ~/Downloads/bcbio.variation-0.2.1-standalone.jar variant-compare ref-grading.yaml
where my ref-grading.yaml file contains the following:
dir:
out: grading
prep: grading/prep
experiments:
ref: /export/home/pjones/bcbio/genomes/Hsapiens/hg19/seq/hg19.fa
intervals: ref.bed
summary-level: quick
approach: grade
calls:
file: ref.vcf
remove-refcalls: true
prep: true
preclean: true
remove-refcalls: true
file: case1.vcf
intervals: ref.bed
I get the following error, (I am not familiar with java though):
2015-01-12 16:48:18,299 [INFO ] MLog clients using log4j logging.
2015-01-12 16:48:18,760 [INFO ] State :begin :: {:desc "Starting variation analysis"}
2015-01-12 16:48:18,788 [INFO ] State :clean :: {:desc "Cleaning input VCF: reference"}
2015-01-12 16:48:18,789 [INFO ] State :merge :: {:desc "Merging multiple input files: reference"}
2015-01-12 16:48:18,790 [INFO ] State :prep :: {:desc "Prepare VCF, resorting to genome build: reference"}
"ava.lang.NumberFormatException: For input string: "14596
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Integer.parseInt(Integer.java:492)
at java.lang.Integer.parseInt(Integer.java:527)
at bcbio.align.ref$prep_bedline_sort$fn__1333.invoke(ref.clj:85)
at bcbio.align.ref$sort_bed_file$fn__1338$fn__1339$fn__1344.invoke(ref.clj:98)
at clojure.core$sort_by$fn__4299.invoke(core.clj:2769)
at clojure.lang.AFunction.compare(AFunction.java:49)
at java.util.TimSort.countRunAndMakeAscending(TimSort.java:324)
at java.util.TimSort.sort(TimSort.java:203)
at java.util.TimSort.sort(TimSort.java:173)
at java.util.Arrays.sort(Arrays.java:659)
at clojure.core$sort.invoke(core.clj:2754)
at clojure.core$sort_by.invoke(core.clj:2769)
at clojure.core$sort_by.invoke(core.clj:2767)
at bcbio.align.ref$sort_bed_file$fn__1338$fn__1339.invoke(ref.clj:99)
at bcbio.align.ref$sort_bed_file$fn__1338.invoke(ref.clj:97)
at bcbio.align.ref$sort_bed_file.invoke(ref.clj:96)
at bcbio.run.broad$gatk_cl_intersect_intervals$fn__1816.invoke(broad.clj:56)
at clojure.core$map$fn__4207.invoke(core.clj:2487)
at clojure.lang.LazySeq.sval(LazySeq.java:42)
at clojure.lang.LazySeq.seq(LazySeq.java:60)
at clojure.lang.RT.seq(RT.java:484)
at clojure.core$seq.invoke(core.clj:133)
at clojure.core$map$fn__4207.invoke(core.clj:2479)
at clojure.lang.LazySeq.sval(LazySeq.java:42)
at clojure.lang.LazySeq.seq(LazySeq.java:60)
at clojure.lang.RT.seq(RT.java:484)
at clojure.core$seq.invoke(core.clj:133)
at clojure.core$tree_seq$walk__4647$fn__4648.invoke(core.clj:4475)
at clojure.lang.LazySeq.sval(LazySeq.java:42)
at clojure.lang.LazySeq.seq(LazySeq.java:60)
at clojure.lang.LazySeq.more(LazySeq.java:96)
at clojure.lang.RT.more(RT.java:607)
at clojure.core$rest.invoke(core.clj:73)
at clojure.core$flatten.invoke(core.clj:6478)
at bcbio.run.broad$gatk_cl_intersect_intervals.doInvoke(broad.clj:56)
at clojure.lang.RestFn.invoke(RestFn.java:425)
at bcbio.variation.filter.intervals$select_by_sample.doInvoke(intervals.clj:56)
at clojure.lang.RestFn.invoke(RestFn.java:846)
at bcbio.variation.combine$dirty_prep_work$run_sample_select__1157.invoke(combine.clj:140)
at bcbio.variation.combine$dirty_prep_work.invoke(combine.clj:155)
at bcbio.variation.combine$gatk_normalize.invoke(combine.clj:187)
at bcbio.variation.compare$prepare_vcf_calls$fn__7526.invoke(compare.clj:120)
at clojure.core$map$fn__4207.invoke(core.clj:2487)
at clojure.lang.LazySeq.sval(LazySeq.java:42)
at clojure.lang.LazySeq.seq(LazySeq.java:60)
at clojure.lang.RT.seq(RT.java:484)
at clojure.lang.LazilyPersistentVector.create(LazilyPersistentVector.java:31)
at clojure.core$vec.invoke(core.clj:354)
at bcbio.variation.compare$prepare_vcf_calls.invoke(compare.clj:121)
at bcbio.variation.compare$variant_comparison_from_config$iter__7582__7586$fn__7587.invoke(compare.clj:255)
at clojure.lang.LazySeq.sval(LazySeq.java:42)
at clojure.lang.LazySeq.seq(LazySeq.java:60)
at clojure.lang.RT.seq(RT.java:484)
at clojure.core$seq.invoke(core.clj:133)
at clojure.core$tree_seq$walk__4647$fn__4648.invoke(core.clj:4475)
at clojure.lang.LazySeq.sval(LazySeq.java:42)
at clojure.lang.LazySeq.seq(LazySeq.java:60)
at clojure.lang.LazySeq.more(LazySeq.java:96)
at clojure.lang.RT.more(RT.java:607)
at clojure.core$rest.invoke(core.clj:73)
at clojure.core$flatten.invoke(core.clj:6478)
at bcbio.variation.compare$variant_comparison_from_config.invoke(compare.clj:254)
at bcbio.variation.compare$_main.invoke(compare.clj:274)
at clojure.lang.AFn.applyToHelper(AFn.java:161)
at clojure.lang.AFn.applyTo(AFn.java:151)
at clojure.core$apply.invoke(core.clj:617)
at bcbio.variation.core$_main.doInvoke(core.clj:35)
at clojure.lang.RestFn.applyTo(RestFn.java:137)
at bcbio.variation.core.main(Unknown Source)
I have no idea how to start debuggin this, is there some input file format that I am not aware of? Must my reference.fa be truncated to the same chromosomes as indicated in the bed file?
My Aim: To get a good estimate of the false positive/negative rate, as well as possible factors influencing these (such as coverage, entropy of neigbouring regions, mapping quality etc).
Additional information:
from the header of the vcf file the reference appears to be hg19 ucsc (which is what I used), it also appears that the additional chromosomes have been removed from the header and the call list in the vcf file (ie only chr1 - 22 + x +y). The ref.vcf and bed was downloaded and appear to have the same ucsc naming convension. My reference is indexed and there exists a gatk dictionary file. Java version (jdk 1.7.0_45). CentosOS, cluster with lustre file system.
Kind Regards,
Piet Jones
The text was updated successfully, but these errors were encountered: