Nans when following Re-training Parametric UMAP with landmarks tutorial #1180

EHenryPega · 2025-01-24T11:43:50Z

Hey umap team!

Firstly, a big thanks for all the work on this library, it is incredibly useful! The ability to retrain a ParametricUMAP whilst preserving the mapping for embeddings that have already been processed would be incredible.

I tried this out for my own use case, using the example here on umap-learn as a reference. However, when it came to the retraining phase, the reported loss for each epoch is always nan.

I assumed this was an issue with my own setup, so I copied the example verbatim. Unfortunately I get the exact same outcome. The model does not retrain successfully.

p_embedder.fit(x2_lmk, landmark_positions=landmarks)
Epoch 1/10
3921/3921 ━━━━━━━━━━━━━━━━━━━━ 21s 5ms/step - loss: nan
Epoch 2/10
3921/3921 ━━━━━━━━━━━━━━━━━━━━ 20s 5ms/step - loss: nan
Epoch 3/10
3921/3921 ━━━━━━━━━━━━━━━━━━━━ 20s 5ms/step - loss: nan
Epoch 4/10
3921/3921 ━━━━━━━━━━━━━━━━━━━━ 20s 5ms/step - loss: nan
Epoch 5/10
3921/3921 ━━━━━━━━━━━━━━━━━━━━ 20s 5ms/step - loss: nan
Epoch 6/10
3921/3921 ━━━━━━━━━━━━━━━━━━━━ 19s 5ms/step - loss: nan
Epoch 7/10
3921/3921 ━━━━━━━━━━━━━━━━━━━━ 19s 5ms/step - loss: nan
Epoch 8/10
3921/3921 ━━━━━━━━━━━━━━━━━━━━ 19s 5ms/step - loss: nan
Epoch 9/10
3921/3921 ━━━━━━━━━━━━━━━━━━━━ 19s 5ms/step - loss: nan
Epoch 10/10
3921/3921 ━━━━━━━━━━━━━━━━━━━━ 19s 5ms/step - loss: nan

I suspect there has either been some kind of regression or there have been some updates to the library that are not reflected in the example.

Any help or suggestions would be greatly appreciated. Cheers!

The text was updated successfully, but these errors were encountered:

timsainb · 2025-02-01T19:22:09Z

this is related to #1153 maybe @jacobgolding knows what is going on, I don't have much experience with landmarks yet

jacobgolding · 2025-02-04T09:02:08Z

Hello!
I think this might be related to changes in #1156 , it looks like the documentation hasn't been updated to reflect the new helper functions for adding landmarks.
I've set aside some time in the next couple of days to make sure this is the issue, and remedy it. In the meantime, give the notebook a try instead of the code in the docco:
https://github.com/lmcinnes/umap/blob/a012b9d8751d98b94935ca21f278a54b3c3e1b7f/notebooks/MNIST_Landmarks.ipynb

EHenryPega · 2025-02-06T08:06:04Z

Thanks for the reply. I did notice that there were some nice new helper functions in that notebook which make life a lot simpler!

Unfortunately, I still ran into the same issues when using these.

I have been able to run the notebooks successfully on a remote machine. As far as I can tell, the issue is related to my laptop using an M3 chip. I have tried many different tensor flow libraries, from vanilla to those suggested here https://github.com/ChaitanyaK77/Initializing-TensorFlow-Environment-on-M3-M3-Pro-and-M3-Max-Macbook-Pros.

Unfortunately I always end up with NaNs for loss and a broken model when fitting the model using landmarks.

jacobgolding · 2025-02-06T12:42:41Z

After some testing today I mostly just confused myself. I found a couple of things:

When I first ran the notebook as is on the most recent version from my fork I encountered the same issue as you (on an M2 chip).
scikit-learn has updated their check_array function, specifically changing force_all_finite to ensure_all_finite. This is going to be a breaking change with 1.8 for UMAP as a whole, so there's work to be done to prepare for that (@lmcinnes )
Upgrading to scikit-learn 1.6 (the most recent version at the moment) temporarily fixed the nans on re-training, but not consistently. I can re-run the same code and get either something that works or something that doesn't.

Unfortunately I won't have much more of a chance to debug this in the near future. The next thing I would try is investigating the default landmark loss function and see what's going on there, perhaps using ops.subtract.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Nans when following Re-training Parametric UMAP with landmarks tutorial #1180

Nans when following Re-training Parametric UMAP with landmarks tutorial #1180

EHenryPega commented Jan 24, 2025 •

edited

Loading

timsainb commented Feb 1, 2025

jacobgolding commented Feb 4, 2025

EHenryPega commented Feb 6, 2025

jacobgolding commented Feb 6, 2025

Nans when following Re-training Parametric UMAP with landmarks tutorial #1180

Nans when following Re-training Parametric UMAP with landmarks tutorial #1180

Comments

EHenryPega commented Jan 24, 2025 • edited Loading

timsainb commented Feb 1, 2025

jacobgolding commented Feb 4, 2025

EHenryPega commented Feb 6, 2025

jacobgolding commented Feb 6, 2025

EHenryPega commented Jan 24, 2025 •

edited

Loading