Curse of dimensionality with Nearest Neighbour approaches? #13

mr-september · 2022-05-03T01:37:49Z

Hi Authors,

Sorry this is less of a technical issue than a conceptual question.

You have alluded to the curse of dimensionality in the original paper, one major component of which is the dilution of "distances" (e.g. On the Surprising Behavior of Distance Metric in High-Dimensional Space, or a more accessible summary here).

Step 1 of EMBEDR relies on calculating NNs, the code appears to rely on default methods in numpy e.g. "euclidean", "l2", "sqeuclidean", ..., "sokalsneath", "yule"].

Do you have recommendations for these distance metrics to mitigate the "curse"? Or are there other parts of the algorithm to help with that?

The text was updated successfully, but these errors were encountered:

ejohnson643 · 2022-05-03T17:14:10Z

Hello!

Thanks for your question! Yes, the curse of dimensionality definitely impacts the accuracy of NN identification, however, the use of "fuzzy" similarities in t-SNE and UMAP helps to circumvent this problem. EMBEDR uses these same similarities to assess the quality of the embeddings. Furthermore, EMBEDR repeats the embedding process several times, which ideally will average out some of these NN errors.

More concretely, in testing we did not find qualitative differences in the output of the algorithm when the metric used to find NNs was changed. Depending on the data you are analyzing, using different metrics may be more appropriate (the Jaccard distance for analyzing documents, for example), but in general, using the same metric as the dimensionality reduction method of choice is the best option.

mr-september · 2022-05-12T03:44:57Z

Thank you for the great response! Sorry to be a further bother, but do you have any recommended readings that discuss how t-SNE/UMAP/other DR tools help to mitigate the curse of dimensionality on NN identification? Or perhaps any brief comparisons using the data shared in this study? Sorry I know this would be quite a bit of work, and completely understand if the lab has other priorities.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Curse of dimensionality with Nearest Neighbour approaches? #13

Curse of dimensionality with Nearest Neighbour approaches? #13

mr-september commented May 3, 2022

ejohnson643 commented May 3, 2022

mr-september commented May 12, 2022

Curse of dimensionality with Nearest Neighbour approaches? #13

Curse of dimensionality with Nearest Neighbour approaches? #13

Comments

mr-september commented May 3, 2022

ejohnson643 commented May 3, 2022

mr-september commented May 12, 2022