Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Errors with reloading EMBEDR, Simes' method, question about figures #12

Open
Ebony-Watson opened this issue Apr 15, 2022 · 3 comments
Open

Comments

@Ebony-Watson
Copy link

Ebony-Watson commented Apr 15, 2022

Hi,

I love the idea of EMBEDR and am very excited to try it out. I have a question and a couple of errors I am trying to deal with though.

  1. I was really hoping to be able to reproduce some of the figures from the papers supplementary, specifically the cell-wise ones (e.g. Fig s13 and s14). In the supp it says that all of these figures are available on the github repository, but the I cannot find them. Could you point me towards them, and the scripts used to produce them?

  2. When setting pVal_type='simes', embObj.fit returns the error:

 ~\AppData\Local\Temp/ipykernel_18496/4265488886.py in <module>
     24                     pVal_type='simes',
     25                     verbose=3)
---> 26     embObj.fit(datasets[dat]) ## Use 'fit' to generate the embeddings.
     27 
     28     embObjs[(alg, dat)] = embObj

c:\users\uqewats6\miniconda3\envs\distance\lib\site-packages\EMBEDR\embedr.py in fit(self, X)
    288 
    289         ## Get p-Values
--> 290         self.calculate_pValues()
    291 
    292     def _validate_with_data(self, X):

c:\users\uqewats6\miniconda3\envs\distance\lib\site-packages\EMBEDR\embedr.py in calculate_pValues(self)
   1840             simes_mult = n_embeds / np.arange(1, n_embeds + 1).reshape(-1, 1)
   1841             pVal_idx = np.argsort(pVals, axis=0)
-> 1842             summ_pVals = np.min(pVals[pVal_idx] * simes_mult, axis=0)
   1843 
   1844             self._pValues = pVals[:]
ValueError: operands could not be broadcast together with shapes (5,1000,1000) (5,1)

  1. When trying to run fit on a pre-run EMBEDR project to load it back in, I get the following error:
FileNotFoundError                         Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_18496/4265488886.py in <module>
     24                     pVal_type='simes',
     25                     verbose=3)
---> 26     embObj.fit(datasets[dat]) ## Use `fit` to generate the embeddings.
     27 
     28     embObjs[(alg, dat)] = embObj

c:\users\uqewats6\miniconda3\envs\distance\lib\site-packages\EMBEDR\embedr.py in fit(self, X)
    279 
    280         ## Finally, we can do the computations!
--> 281         self._fit(null_fit=False)
    282 
    283         #####################

c:\users\uqewats6\miniconda3\envs\distance\lib\site-packages\EMBEDR\embedr.py in _fit(self, null_fit)
    516 
    517             ## Basically, no matter what, we need the kNN graph
--> 518             self.data_kNN = self.get_kNN_graph(self.data_X)
    519 
    520             ## If we're using t-SNE to embed or DKL as the EES, we need an

c:\users\uqewats6\miniconda3\envs\distance\lib\site-packages\EMBEDR\embedr.py in get_kNN_graph(self, X, null_fit)
    620         ## If we're doing file caching, first we want to try and load the graph
    621         if self.do_cache:
--> 622             loaded_kNN = self.load_kNN_graph(X,
    623                                              kNNObj=kNNObj,
    624                                              seed=seed,

c:\users\uqewats6\miniconda3\envs\distance\lib\site-packages\EMBEDR\embedr.py in load_kNN_graph(self, X, kNNObj, seed, null_fit, raise_error)
    752 
    753         ## If a path has been found to a matching kNN graph load it!
--> 754         with open(kNN_path, 'rb') as f:
    755             kNNObj = pkl.load(f)
    756             kNNObj.verbose = self.verbose

FileNotFoundError: [Errno 2] No such file or directory: `'./EMBEDR/projects/Run_3_180422_simes/tSNE_D_Sim\\57393bff6c57e524d74111f146808481\\tSNE_D_Sim\\57393bff6c57e524d74111f146808481\\Data_kNN_0000.knn'
  1. Do you have notebooks/scripts for your implementation of the DR other methods? I would like to test a new method with EMBEDR and these would be super helpful.

If you could give me a hand with any of these it would be great!

Thanks!
Ebony

@Ebony-Watson Ebony-Watson changed the title Pre-processing & input to EMBEDR? Pre-processing & input to EMBEDR, Simes' method Apr 18, 2022
@Ebony-Watson Ebony-Watson changed the title Pre-processing & input to EMBEDR, Simes' method Errors with reloading EMBEDR, Simes' method, question about figures Apr 18, 2022
@ejohnson643
Copy link
Owner

Hello!

Sorry for the delay in responding!

  1. I will look into finding and posting those figure-generating codes.
  2. Can you tell me the initialization you used and the size of the data that generated this error?
  3. This looks like a bug in the path-finding code... Again if you could send me the full script that generated this error I might be able to diagnose the issue better.
  4. I was waiting for an update to UMAP before adding that example, but it appears to have been updated so I will generate and post that shortly!

@Ebony-Watson
Copy link
Author

Hi,

No worries, thanks for getting back to me!

R.e #3 I figured it out, was just an issue with how windows was handling slashes with '.split("/")[-1]', I just had to change it out to .split(os.sep)[-1] and have had no more problems there.

For #2, the error has occured with datasets from 1000 cells x 10PCs up to 3500 cells x 3700 genes. For the intialisation, i've tried:

embObj = EMBEDR(DRA=alg, #error occurs with t-SNE and UMAP
n_jobs=-1,
random_state=96, #have tried multiple
n_data_embed=5,
n_null_embed=5,
n_components=2,
perplexity=perp, #have tried multiple from 10 to 300
n_neighbors=n_neib, #have tried multiple from 31 to 999
pVal_type='simes',
verbose=3,
project_name=f'{alg}_{dat}',
project_dir=project_dir)

Im running Python 3.8.13 with Jupyter lab 3.3.4.

Thanks, and I saw on twitter you recently defended so congratulations!
Ebony

@Ebony-Watson
Copy link
Author

Also, I was wondering why you chose not to perform multiple testing adjustment of the p-values for the paper? This is why I originally wanted to see how using the Simes method changed the results. As I couldnt get it working I performed BH adjustment on the raw p-values before averaging across the embeddings instead, and saw a large difference for some datasets. Does the way the p-value is dervived make adjustment unnecessary?

Sorry for all the questions!
Ebony

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants