word embeddings: graph of nearest neighbours #1434

lukavdplas · 2023-09-13T11:56:48Z

lukavdplas
Sep 13, 2023
Maintainer

I spent some time working on a scatter plot last year (#783).

The reason for having a plot like that is that getting the N nearest neighbours for one term at a time can make it difficult to explore or understand connections between words.

The proposed solution was to use dimension reduction to create a 2D projection of the embeddings. This is nice, but not without its problems.

One issue is that two words appearing closely together in a 2D map suggests that the underlying embeddings are very similar, because dimension reduction comes with information loss, this doesn't have to be true. (The optimal projection may be "sacrificing" an accurate distance between these two terms.)

This is problematic if someone uses the graph to draw conclusions about individual pairs of terms. There are definitely ways to use 2D/3D projections "responsibly" and assign meaningful interpretations to them, but our userbase may not be asking the types of questions that 2D maps can answer.

I've been thinking of an alternative way of visualising a "neighbourhood" by using a weighted graph instead. Basically:

Select terms to include - these will be the vertices of the graph
Create a similarity matrix between the selected terms
Drop all similarities below a certain threshold
All remaining similarities become edges in the graph, where the similarity score is the weight of the edge.

(Note: the first two steps were also implemented in #1030)

Then use a library like d3 or sigmaJS to make a graph. Ideally, there is some implementation of force, so closely related terms are pulled together.

Rough sketch:

The result is easier to interpret than a 2D map since you can actually see individual connections.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

word embeddings: graph of nearest neighbours #1434

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

word embeddings: graph of nearest neighbours #1434

lukavdplas Sep 13, 2023 Maintainer

Replies: 0 comments

lukavdplas
Sep 13, 2023
Maintainer