Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consistency between python and C++ versions #4

Open
dleopold opened this issue Jun 26, 2020 · 4 comments
Open

Consistency between python and C++ versions #4

dleopold opened this issue Jun 26, 2020 · 4 comments

Comments

@dleopold
Copy link

I was able to get the C++ version compiled, but now I am seeing large discrepancies with the results compared to the python version. The problem seems to be with the C++ version. For example, here is the output for 2 phenotypes using the python version:

(base) harpua@m4:~/Projects/Populus_GWAS$ python/bin/python3 python/feather/permutation_testing.py --kinship_eigenvalues output/scratch/e.vals --kinship_eigenvectors output/scratch/e.vecs --phenotypes output/scratch/Y --permutation --samc --n_permutations 100000 
Note: No covariates supplied, using a constant intercept covariate.
Calculating heritability estimates...
100%|█████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 24.56it/s]

Calculating permutation p-values...
100%|████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [06:31<00:00, 195.90s/it]
n_phen  h2_est  param_p perm_p  RSE                                                                                             
0       0.92585 1.967e-09       4.49e-06        0.03457
1       0.53729 1.094e-14       0.0001163       0.0514

With the exact same data, here is some output from the C++ version:

(base) harpua@m4:~/Projects/Populus_GWAS$ ./python/feather/permutation_testing_samc --eigenvalues_filename output/scratch/e.vals --eigenvectors_filename output/scratch/e.vecs --phenotypes_filename output/scratch/Y --heritabilities_filename output/scratch/heritability_estimates.txt --phenotype_indices 0,1 --n_permutations 1000000 --samc true

0%   10   20   30   40   50   60   70   80   90   100%
|----|----|----|----|----|----|----|----|----|----|
***************************************************
0.889689
0.772495

Without using samc, jut the fast permutation method, the results are more similar, though the python version seems to consistently give lower p-values over many runs of each method. Here example of the same 2 phenotypes using fast permutations in the python version:

(base) harpua@m4:~/Projects/Populus_GWAS$ python/bin/python3 python/feather/permutation_testing.py  --kinship_eigenvalues output/scratch/e.vals  --kinship_eigenvectors output/scratch/e.vecs  --phenotypes output/scratch/Y --permutation --fast --n_permutations 100000 
Note: No covariates supplied, using a constant intercept covariate.
Calculating heritability estimates...
100%|█████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 22.86it/s]

Calculating permutation p-values...
100%|█████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:05<00:00,  5.54s/it]
100%|█████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:05<00:00,  5.54s/it]
100%|████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [10:57<00:00, 328.74s/it]
n_phen  h2_est  param_p perm_p                                                                                                  
0       0.92585 1.967e-09       0
1       0.53729 1.094e-14       0.00011

And here is the C++:

(base) harpua@m4:~/Projects/Populus_GWAS$ ./python/feather/permutation_testing_samc --eigenvalues_filename output/scratch/e.vals --eigenvectors_filename output/scratch/e.vecs --phenotypes_filename output/scratch/Y --heritabilities_filename output/scratch/heritability_estimates.txt --phenotype_indices 0,1 --n_permutations 100000

0%   10   20   30   40   50   60   70   80   90   100%
|----|----|----|----|----|----|----|----|----|----|
***************************************************
 0.0016
0.00572
@regevs
Copy link
Contributor

regevs commented Jun 26, 2020 via email

@dleopold
Copy link
Author

Sure. Here is the phenotype file, as well as the eigenvalues and eigenvectors I used in the example posted above. I also just tested the C++ version on the example data and the result was similar to what is shown in the documentation, but I do not see an obvious reason why my data would be behaving differently.

@dleopold
Copy link
Author

I think I have pinpointed the problem. The input files I was using were tab delimited, but it appears that the C++ version of feather does not function properly with this format. I noticed that the example files are space delimited, and when I switched my input files to this format the output is as expected. I do not know C++, so I did not poke around in the code to try and figure out why this is and why the fast permutation in C++ gives closer to correct (but not quite right) output than the samc approach. I think it would be good to generalize the code it to accept tab delimited files or have the program throw an error if the input format in not what is required, particularly given that the tab delimited files are common and work with the python version of feather.

@regevs
Copy link
Contributor

regevs commented Jul 1, 2020 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants