word embeddings: graph of nearest neighbours #1434
lukavdplas
started this conversation in
Ideas
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I spent some time working on a scatter plot last year (#783).
The reason for having a plot like that is that getting the N nearest neighbours for one term at a time can make it difficult to explore or understand connections between words.
The proposed solution was to use dimension reduction to create a 2D projection of the embeddings. This is nice, but not without its problems.
One issue is that two words appearing closely together in a 2D map suggests that the underlying embeddings are very similar, because dimension reduction comes with information loss, this doesn't have to be true. (The optimal projection may be "sacrificing" an accurate distance between these two terms.)
This is problematic if someone uses the graph to draw conclusions about individual pairs of terms. There are definitely ways to use 2D/3D projections "responsibly" and assign meaningful interpretations to them, but our userbase may not be asking the types of questions that 2D maps can answer.
I've been thinking of an alternative way of visualising a "neighbourhood" by using a weighted graph instead. Basically:
(Note: the first two steps were also implemented in #1030)
Then use a library like d3 or sigmaJS to make a graph. Ideally, there is some implementation of force, so closely related terms are pulled together.
Rough sketch:
The result is easier to interpret than a 2D map since you can actually see individual connections.
Beta Was this translation helpful? Give feedback.
All reactions