hap.py benchmarking #896

alisamatisse · 2024-10-18T12:37:31Z

Hello,

Sorry for writing again about some questions in the analysis, I don't have any expertise on DeepVariant/variant benchmarking around me :(

I wanted to benchmark identified variants using hap.py as you suggest in the tutorial, and the output surprised me a bit. For the variants I called the recall/precision were from 0 to 0.4, which is very low... I did not change default pacbio parameters when running DeepVariant, except for asking to keep supplementary alignments. Maybe it is partially because of the complexity of HLA region, but I am not sure what is the reason. What do you think, is there something obviously wrong?

singularity exec --bind /usr/lib/locale/ \
docker://google/deepvariant:${BIN_VERSION} \
    /opt/deepvariant/bin/run_deepvariant \
    --model_type PACBIO \
    --ref $REFERENCE \
    --reads $BAM_FILE \
    --make_examples_extra_args=keep_supplementary_alignments=true \
    --output_vcf  $VCF_FILE \
    --num_shards 12 \
    --regions chr6:32541543-32701886

I am using version 1.6.1.

Kind regards,
Alisa

The text was updated successfully, but these errors were encountered:

lucasbrambrink · 2024-10-18T17:52:05Z

Great question, always happy to help!

Setting keep_supplementary_alignments=true shouldn't dramatically change the accuracy.

Is it possible for you to share the BAM and VCF/truth set used? One possibility is that the truth set doesn't match HG003. For example if you run happy for HG003 using HG002 truth set the accuracy would be around 0.4.

jrharting · 2024-10-25T17:31:46Z

That particular region in HLA is likely to have multiple different alleles mapped to it due to the diversity of structural variation in the DRB loci. Hg38 (no_alts) only has DRB5 in chr6. If your sample has DRB1 alleles that are linked to DRB3/4, these reads often map here.

(see fig 4) https://pmc.ncbi.nlm.nih.gov/articles/PMC10849470/

alisamatisse · 2024-10-25T22:33:38Z

Hi Lucas and John,

Thank you so much for your input. I’m currently trying to figure out what’s happening… with different references..!

Kind regards,
Alisa

alisamatisse · 2025-03-07T17:17:49Z

Hey, sorry to reopen the issue. I called variants against HLA class II genes region for HG003 using HG003_GRCh38_1_22_v4.2.1_benchmark_noinconsistent.bed

and have very low precision values :( probably I am doing wrong variant calling in the HLA...
also for some reason my coverage for this region is also very low, although WG is okay. I am not sure if it was reported elsewhere.

sorry for dumb questions, I am trying to figure this out on my own (and with AI)

kishwarshafin · 2025-03-11T17:17:44Z

@alisamatisse ,

You need to provide the HLA bed with -T option in hap.py cause it seems like it's comparing against everything. Can you paste your hap.py command here?

alisamatisse closed this as completed Oct 28, 2024

alisamatisse reopened this Mar 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

hap.py benchmarking #896

hap.py benchmarking #896

alisamatisse commented Oct 18, 2024

lucasbrambrink commented Oct 18, 2024

jrharting commented Oct 25, 2024

alisamatisse commented Oct 25, 2024 •

edited

Loading

alisamatisse commented Mar 7, 2025

kishwarshafin commented Mar 11, 2025

hap.py benchmarking #896

hap.py benchmarking #896

Comments

alisamatisse commented Oct 18, 2024

lucasbrambrink commented Oct 18, 2024

jrharting commented Oct 25, 2024

alisamatisse commented Oct 25, 2024 • edited Loading

alisamatisse commented Mar 7, 2025

kishwarshafin commented Mar 11, 2025

alisamatisse commented Oct 25, 2024 •

edited

Loading