Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

some issue about nan #4

Open
shenhaizhongdechanrao opened this issue Mar 13, 2024 · 3 comments
Open

some issue about nan #4

shenhaizhongdechanrao opened this issue Mar 13, 2024 · 3 comments

Comments

@shenhaizhongdechanrao
Copy link

Hello, I am trying to construct a tree using the VCF file obtained from merging with GATK and bcftools. After using the VCF2Dis command to generate a .mat file, but there are many '-nan' in the result. Can you give me some suggestions?

@hewm2008
Copy link
Contributor

You have too few vcf sites and too many miss genotypes.
It is recommended to generate vcf directly from gvcf merging instead of bcftools merging, because this will cause many sites to be missed.

@chanity256
Copy link

老师您好,我也出现了相同的问题。产生的.mat文件中有许多的nan,导致现在无法建树。这是我的代码:
#1.合并所有的gvcf并进行joint callling

查找所有的 GVCF 文件

#gvcf_files=$(find $gvcfgz_dir -type f -name "*.gvcf.gz")

构建输入文件列表并执行 Sentieon 命令

#$SENTIEON_INSTALL_DIR/bin/sentieon driver -t $nt -r $reference
#--algo GVCFtyper
#$(for file in $gvcf_files; do echo -n "-v $file "; done)
#$output_vcf/${name_merged}.vcf

2.对合并的vcf文件SelectVariants-提取 SNPs

#gatk --java-options "-Xmx50g" SelectVariants -R $reference -select-type-to-include SNP -V $output_vcf/${name_merged}.vcf -O $output_vcf/${name_merged}.snp.vcf

3.VariantFiltration SNP 硬过滤,并去除低质量的 SNP(也就是有SNP_Filter标记的行)

#gatk --java-options "-Xmx50g" VariantFiltration -V $output_vcf/${name_merged}.snp.vcf --filter-expression 'QD < 2.0 || MQ < 40.0 || FS > 60.0 || SOR > 3.0 || MQRankSum < -12.5 || ReadPosRankSum < -8.0' --filter-name 'SNP_Filter' -O $output_vcf/${name_merged}.snp.filtering.vcf

#less $output_vcf/${name_merged}.snp.filtering.vcf | grep -v "SNP_Filter" > $output_vcf/${name_merged}.snp.filtered.vcf

4.vcftools再过滤

vcftools --vcf $output_vcf/${name_merged}.snp.filtered.vcf --max-missing 0.2 --minQ 30 --remove-indels --min-alleles 2 --max-alleles 2 --maf 0.05 --recode --recode-INFO-all --out $output_vcf/${name_merged}.snp.filtered.miss0.2maf0.05.vcf

请问是我的vcftools过滤条件的问题吗?

@hewm2008
Copy link
Contributor

hewm2008 commented Sep 9, 2024

这种情况 是你的数据问题
很大的概率就是你 vcf里面有一个样品严重miss . 你这个样品的测序深度太太低 or 你这个样品(外群太远了,比对ref老都比对不上),建议过滤mapping Q>10 小于50%的样品。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants