-
Notifications
You must be signed in to change notification settings - Fork 739
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Poor SNP calling performance #937
Comments
Hi @valeandri , Under your "Command on gpu docker image:" section, your first line was:
I believe that's probably just a typo. It should be Other than that, I went through your command and it looks reasonable to me. Given your Recall is surprisingly low, can you share your hap.py command? And, as another check, can you make sure you run through https://github.com/google/deepvariant/blob/r1.8/docs/deepvariant-pacbio-model-case-study.md and confirm that worked for you? Thanks! |
Hi @pichuan, Thank for looking into that and sorry for the typo. I used call_variants, as you mentioned. My hap.py command is the following:
I used the input files from https://github.com/google/deepvariant/blob/r1.8/docs/deepvariant-pacbio-model-case-study.md (HG003.SPRQ.pacbio.GRCh38.nov2024.chr20.bam) which I realigned with pbmm2 because some contigs were different from my fasta:
I hope I could give you enough information to understand the possible issue. Valentina |
Hi @valeandri, There is one issue I noticed in your postprocessing command: you did not specify DeepVariant uses two models to call variants: a CNN model for the most difficult variants and a "small model" for the less difficult ones. The small model outputs predictions into a separate file. Later, when running postprocessing, you need to specify this file using the You can check Since the small model calls most of the variants, omitting it during postprocessing could explain the poor metrics you observed. Hope that helps! |
Have you checked the FAQ? https://github.com/google/deepvariant/blob/r1.8/docs/FAQ.md:
YES
Describe the issue:
I used the example data described in https://github.com/google/deepvariant/blob/r1.8/docs/deepvariant-pacbio-model-case-study.md and got a very low SNP/INDEL calling performance.
Setup
Steps to reproduce:
The resulting hap.py benchmark:
Type Filter METRIC.Recall METRIC.Precision METRIC.Frac_NA METRIC.F1_Score
INDEL ALL 0.29347 0.9764 0.694141 0.451297
INDEL PASS 0.29347 0.9764 0.694141 0.451297
SNP ALL 0.01267 0.939577 0.955238 0.025003
SNP PASS 0.01267 0.939577 0.955238 0.025003
What am I missing here?
Valentina
The text was updated successfully, but these errors were encountered: