Most called variants have very low quality score #936

Jiayi-Wang-Joey · 2025-02-18T10:23:24Z

Dear DeepVariant team,

Thanks for your great work. I am running DeepVariant on Pacbio Mas-seq scRNA-seq (on pseudo bulk level).
This is my command:

singularity exec --bind /usr/lib/locale/ deepvariant-1.8.0.simg /opt/deepvariant/bin/run_deepvariant\
             --model_type MASSEQ \
             --ref {input.ref} \
             --reads {input.bam} \
             --output_vcf {output} \
             --num_shards {params.threads} \
             --intermediate_results_dir /home/jiayiwang/tmp/{wildcards.sample} \
         > {log} 2>&1

For example, for one sample, after some filters (coverage, dbSNP etc.) I got 214241 variants and then I set a filter with QUAL >= 10, I only got 28 variants. When I set the filter to be PASS, I also only get 355 variants. Other samples have similar passing rates. I used the same filters on the results from Clair3-RNA, there are still 150830 variants left. Therefore, I assume the very small number of high quality (or PASS) variants from DeepVariant is somehow problematic.

(Is it possible because that I didn't run splitNC and flagCorrection on my bams? I tried to run these but it seems to take ages, that's why I decided to try without these.)

Do you have any idea about this?

Thanks in advance!

Kind regards,
Jiayi

The text was updated successfully, but these errors were encountered:

AndrewCarroll · 2025-02-18T23:52:13Z

Hi @Jiayi-Wang-Joey

That indeed doesn't sound right. I don't think I've seen 200k entries go to 355 passing variants. Something quite unusual is happening at some point in the process.

The model should be robust to whether you run splitNC and flagCorrection, it is trained with examples that both have that applied and do not have that applied. As an aside, when I run spliNC + flagCorrection, I do this with a separate process per chromosome. This way it's much faster, and my main issue with the process is the higher storage size of the BAMs.

Is it possible to share the DeeVariant VCF with [email protected]? If the sample is sensitive, then no need to share. But from this description alone it's hard to see what might have happened.

Jiayi-Wang-Joey · 2025-02-27T10:23:42Z

Dear Andrew, I have sent you the vcf file. It will be great if you have time to look at it. Thanks!

pichuan assigned AndrewCarroll Feb 20, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Most called variants have very low quality score #936

Most called variants have very low quality score #936

Jiayi-Wang-Joey commented Feb 18, 2025

AndrewCarroll commented Feb 18, 2025

Jiayi-Wang-Joey commented Feb 27, 2025

Most called variants have very low quality score #936

Most called variants have very low quality score #936

Comments

Jiayi-Wang-Joey commented Feb 18, 2025

AndrewCarroll commented Feb 18, 2025

Jiayi-Wang-Joey commented Feb 27, 2025