Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LD values are NA for r2 EM and D prime EM #44

Open
aguilar-gomez opened this issue Mar 11, 2023 · 5 comments
Open

LD values are NA for r2 EM and D prime EM #44

aguilar-gomez opened this issue Mar 11, 2023 · 5 comments

Comments

@aguilar-gomez
Copy link

aguilar-gomez commented Mar 11, 2023

Hi,

I ran ngsLD and am interested in the r2 from the EM algorithm. When I am looking at the values for some positions, the D is negative, Dprime and r2 are nan. Do you know what is happening or how to fix it?
I printed the extended output to see if that helps.

pos1 pos2 distance r2p D Dp r2 nsamples mafpos1 mafpos2 hap00 hap01 hap10 hap11 hap_maf1 hap_maf2 chi2 loglike nIter
P_RNA_scaffold_10726:56595 P_RNA_scaffold_10726:58826 2231 0.103631 -0.045278 NA NA 49 0.664285 0.155807 0.206242 0.069226 0.706824 0.017708 0.724532 0.086934 0.129409 0 11
P_RNA_scaffold_10726:56585 P_RNA_scaffold_10726:58826 2241 0.118901 -0.045282 NA NA 49 0.667382 0.155807 0.206464 0.069283 0.706497 0.017757 0.724254 0.08704 0.129204 0 11
P_RNA_scaffold_10726:35711 P_RNA_scaffold_10726:58826 23115 0.091501 -0.028652 NA NA 46 0.657558 0.155807 0.232471 0.051776 0.686177 0.029577 0.715753 0.081353 0.053992 0 11
P_RNA_scaffold_10726:23779 P_RNA_scaffold_10726:58826 35047 0.12464 -0.044908 NA NA 49 0.687809 0.155807 0.199039 0.06824 0.713665 0.019055 0.73272 0.087295 0.129248 0 12
@fgvieira
Copy link
Owner

Can you send the ngsLD command you used?

@aguilar-gomez
Copy link
Author

ngsLD --n_threads 10 --geno CMScaffoldsofInterestMaf5.beagle.gz --n_ind 63 --n_sites 1145 --outH CMLDMaf5probs_thre --pos scaffoldsOfInterestmaf5.sites --max_kb_dist 0 --max_snp_dist 0 --extend_out --probs --call_geno --call_thresh .9 --min_maf 0.1 --ignore_miss_data

@fgvieira
Copy link
Owner

The problem is that site P_RNA_scaffold_10726:58826 has a lower freq (when calculated through the haplotypes) than --min_maf 0.1. The idea was to exclude LD from sites with very low freq, but I guess it might be better to leave that filtering to the user.

@aguilar-gomez
Copy link
Author

Why is it differing the maf calculated by angsd vs the haplotype based calculation?
I think it would be useful that position was completely excluded from the output, having the nan just makes it harder for plotting and interpreting. Or maybe having also a column with the calculated maf by haplotypes to understand what is happening.

@fgvieira
Copy link
Owner

The positions where maf is lower than --min_maf are excluded, but there is no filter on haplot freqs because these are calculated pairwise between all pairs of SNPs.

And you do have the haplot freqs in the extended output (hap_maf1 and hap_maf2).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants