-
Notifications
You must be signed in to change notification settings - Fork 818
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Nans when following Re-training Parametric UMAP with landmarks tutorial #1180
Comments
this is related to #1153 maybe @jacobgolding knows what is going on, I don't have much experience with landmarks yet |
Hello! |
Thanks for the reply. I did notice that there were some nice new helper functions in that notebook which make life a lot simpler! Unfortunately, I still ran into the same issues when using these. I have been able to run the notebooks successfully on a remote machine. As far as I can tell, the issue is related to my laptop using an M3 chip. I have tried many different tensor flow libraries, from vanilla to those suggested here https://github.com/ChaitanyaK77/Initializing-TensorFlow-Environment-on-M3-M3-Pro-and-M3-Max-Macbook-Pros. Unfortunately I always end up with NaNs for loss and a broken model when fitting the model using landmarks. |
After some testing today I mostly just confused myself. I found a couple of things:
Unfortunately I won't have much more of a chance to debug this in the near future. The next thing I would try is investigating the default landmark loss function and see what's going on there, perhaps using ops.subtract. |
Hey umap team!
Firstly, a big thanks for all the work on this library, it is incredibly useful! The ability to retrain a ParametricUMAP whilst preserving the mapping for embeddings that have already been processed would be incredible.
I tried this out for my own use case, using the example here on umap-learn as a reference. However, when it came to the retraining phase, the reported loss for each epoch is always
nan
.I assumed this was an issue with my own setup, so I copied the example verbatim. Unfortunately I get the exact same outcome. The model does not retrain successfully.
I suspect there has either been some kind of regression or there have been some updates to the library that are not reflected in the example.
Any help or suggestions would be greatly appreciated. Cheers!
The text was updated successfully, but these errors were encountered: