Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Most called variants have very low quality score #936

Open
Jiayi-Wang-Joey opened this issue Feb 18, 2025 · 2 comments
Open

Most called variants have very low quality score #936

Jiayi-Wang-Joey opened this issue Feb 18, 2025 · 2 comments
Assignees

Comments

@Jiayi-Wang-Joey
Copy link

Dear DeepVariant team,

Thanks for your great work. I am running DeepVariant on Pacbio Mas-seq scRNA-seq (on pseudo bulk level).
This is my command:

singularity exec --bind /usr/lib/locale/ deepvariant-1.8.0.simg /opt/deepvariant/bin/run_deepvariant\
             --model_type MASSEQ \
             --ref {input.ref} \
             --reads {input.bam} \
             --output_vcf {output} \
             --num_shards {params.threads} \
             --intermediate_results_dir /home/jiayiwang/tmp/{wildcards.sample} \
         > {log} 2>&1

For example, for one sample, after some filters (coverage, dbSNP etc.) I got 214241 variants and then I set a filter with QUAL >= 10, I only got 28 variants. When I set the filter to be PASS, I also only get 355 variants. Other samples have similar passing rates. I used the same filters on the results from Clair3-RNA, there are still 150830 variants left. Therefore, I assume the very small number of high quality (or PASS) variants from DeepVariant is somehow problematic.

(Is it possible because that I didn't run splitNC and flagCorrection on my bams? I tried to run these but it seems to take ages, that's why I decided to try without these.)

Do you have any idea about this?

Thanks in advance!

Kind regards,
Jiayi

@AndrewCarroll
Copy link
Collaborator

Hi @Jiayi-Wang-Joey

That indeed doesn't sound right. I don't think I've seen 200k entries go to 355 passing variants. Something quite unusual is happening at some point in the process.

The model should be robust to whether you run splitNC and flagCorrection, it is trained with examples that both have that applied and do not have that applied. As an aside, when I run spliNC + flagCorrection, I do this with a separate process per chromosome. This way it's much faster, and my main issue with the process is the higher storage size of the BAMs.

Is it possible to share the DeeVariant VCF with [email protected]? If the sample is sensitive, then no need to share. But from this description alone it's hard to see what might have happened.

@Jiayi-Wang-Joey
Copy link
Author

Dear Andrew, I have sent you the vcf file. It will be great if you have time to look at it. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants