Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error at the validation step #18

Open
ssaif opened this issue Sep 3, 2014 · 7 comments
Open

Error at the validation step #18

ssaif opened this issue Sep 3, 2014 · 7 comments

Comments

@ssaif
Copy link

ssaif commented Sep 3, 2014

Hello,

I am trying to incorporate the ensemble approach in my bcbio analysis and getting errors at the bcbio.variation command for validation of calls. Here are some details,

Run log -
/gpfs/ngs/oncology/Analysis/external/EXT_001_NA12878/EDGE/NA12878_bcbio_NGv3bed/work/run.log

Yaml file for bcbio.variation (to validate freebayes calls) -/gpfs/ngs/oncology/Analysis/external/EXT_001_NA12878/EDGE/NA12878_bcbio_NGv3bed/work/validate/NA12878_Germline_NGv3bed/freebayes/config/validate.yaml

Please let me know if you need additional information about the analysis.

Thanks,
Sakina

@chapmanb
Copy link
Owner

chapmanb commented Sep 4, 2014

Sakina;
Thanks for the report. Happy to look at this if you could make the log and validation files available at a Gist (https://gist.github.com/). Thanks much.

@ssaif
Copy link
Author

ssaif commented Sep 4, 2014

Hi,

They are available here. Please let me know if you can access them.

https://gist.github.com/ssaif/fbb164d1f28b3f4133c3 (Error lines pasted with flanks from the run log)
https://gist.github.com/ssaif/40228395b0f50f9585e9 (Yaml file for freebayes validation)

Thanks,
Sakina

@chapmanb
Copy link
Owner

chapmanb commented Sep 5, 2014

Sakina;
Thanks for the additional detail. It appears as if something is wrong with one of your input VCF files, specifically that is has truncated lines. The code is failing when it tries to access the reference allele to remove any gaps, and is finding a line with fewer fields than expected:

(letfn [(remove-gap [n xs]

It would be worth checking the input VCF to see if something is wrong:

bcftools view /gpfs/ngs/oncology/Analysis/external/EXT_001_NA12878/EDGE/NA12878_bcbio_NGv3bed/work/freebayes/NA12878_Germline_NGv3bed-effects-ploidyfix-filter.vcf.gz

This should spit out the file and perhaps give a better error message to help debug. Hope this helps some with identifying the issue.

@ssaif
Copy link
Author

ssaif commented Sep 5, 2014

Hi Brad,

Thanks for the quick reposnse. I did a few checks on the vcf file and it seems to check out OK.

Another thing I want to point out is that with this run of bcbio where I am also doing the ensemble step, I notice there are vcf files within each caller directory that seem to contain a combined call set (from all chromosomes). This is typically not seen in the run sans bcbio.variation. And the vcf file where you pointed out the error is one such combined calls file. Are these combined output files part of bcbio.variation run?

In order to test this I will run bcbio.variation standalone on calls generated by chromosomes that will hopefully reproduce this behaviour/error.

Thanks,
Sakina

@ssaif
Copy link
Author

ssaif commented Sep 5, 2014

Forgot to share this that I also found that the freebayes vcf did not have calls on chrM because the Nimblegen bed file did not have chrM regions. But the GiaB NIST's vcf and bed files (with hg19) that I using to validate my calls do have chrM (starts with this order) information. Could this be the cause of the bcbio.variation error I am getting?

Thanks,
Sakina

@ssaif
Copy link
Author

ssaif commented Sep 5, 2014

This was using BCBIO version 0.8.1a (alpha), not sure if I mentioned this earlier.

Thanks,
Sakina

@chapmanb
Copy link
Owner

chapmanb commented Sep 7, 2014

Sakina;
Thanks for looking into this more. I added better debugging into a snapshot release of bcbio.variation. If you could download this and replace the existing version this should hopefully provide the exact line in the VCF it is failing at:

wget https://github.com/chapmanb/bcbio.variation/releases/download/v0.1.8-SNAPSHOT-20140906/bcbio.variation-0.1.8-SNAPSHOT-standalone.jar
mv bcbio.variation-0.1.8-SNAPSHOT-standalone.jar /group/ngs/src/bcbio-nextgen/0.8.1a/rhel6-x64/share/java/bcbio_variation/
rm /group/ngs/src/bcbio-nextgen/0.8.1a/rhel6-x64/share/java/bcbio_variation/bcbio.variation-0.1.7-standalone.jar

Regarding your other observations, the comparison handles cases where the regions differ between the input and reference calls. It will only compare in regions present in both, so this shouldn't be an issue. It also prepares combined VCFs independent of bcbio.variation evaluation. That is done for all calling; this is the final input file concatenated from the input files.

Hope re-running with the updated code will help identify the problematic VCF line and shed more information on what is happening.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants