Polarized Words

https://defenderofbasic.github.io/good-and-evil-concepts/

100% browser based semantic embeddings demo. Computes the embedding of the input words and shows their relative distance to the pole words ("good" and "evil").

Here is Obama, Trump, Hitler, Kittens. Hitler is "evil" as expected. Trump is more "evil" than Obama, but both of them are way better than Hitler. Kittens is not as "good" as Obama.

What do these distances mean?

Concretely, these are word associations based on the training data of this language model. If the word "trump" appears with negative words a lot in newspapers, social media, etc, it will be closer to those words.

What it effectively tells us is how our culture talks about these ideas. It will reflect whatever biases or connections are in the training data. It's a way to poke inside the mind of a language model, and thus, the collective consciousness of society.

If we try polarizing words like "capitalism" and "communism", you'll get "communism" being slightly closer to "good" than capitalism. What this tells us is that, in aggregate, there's more net positive discourse around communism, and perhaps more talk of the evils of capitalism.

But society is not a monolith. The aggregate view doesn't let us discern between (1) everyone feels this specific way or (2) different tribes vehemently disagree. If we were to ask people to rate these words, we might see a bimodal distribution where many put "capitalism" as very good and communism as very evil, and others the reverse.

We can validate our findings by surveying people, and by searching for corrborating evidence in the embedding relationships. We can confirm that "capitalism" leans more right wing, and "communism" leans more left wing. And to sanity check, "Obama" and "trucks" fall in the positions we expect along this line:

How the code works

All of the code is in index.html, there is no build system, it's just a hand written HTML/JS (initial UI was generated by Claude and I added in the semantic embedding parts).

Below is the minimal complete vanilla JS snippet you need to (1) fetch an embedding model (2) take an arbitrary piece of text & convert it to an embedding vector.

import { pipeline, env } from 'https://cdn.jsdelivr.net/npm/@xenova/[email protected]';
env.allowLocalModels = false;
// Can pick any model from here:
//  https://huggingface.co/models?other=feature-extraction
const model_name = 'nomic-ai/nomic-embed-text-v1.5'

const embedder = await pipeline('feature-extraction', model_name,
{
    quantized: true,
    progress_callback: data => {
        const { progress, loaded, total } = data
        if (progress) {
            const totalMB = Math.round(total / (1024 * 1024))
            const loadedMB = Math.round(loaded / (1024 * 1024))
            
            console.log(`${Math.round(progress)}% (${loadedMB}/${totalMB} mb)`);
        }
    }
});

const inputText = 'Hello world!'
const embeddingVector = (await embedder(inputText, {pooling: 'mean', normalize: true})).data

From there you can do cosine similarity between two vectors to get the distance between them, or project the words into 2D to see their relationship in concept space.

This is what it would look like for example if you got the semantic vectors for all emojis and plotted them. You can see their relationships, and the "semantic gaps":

This is the same exact process that Kat (@poetengineer__) did here to visualize the latent space of colors. In other words, this is how an LLM "sees" color. It makes sense that similar colors cluster together.

The 🤯 part is how every single word or phrase fits somewhere in this space. Every word has "some" color association, and it is NOT random. It's cultural. We can test this by making the poles 🔴 red and 🟢 green, then testing words like "fire", "grass", which go to red and green as expected. Then looking for things "in the middle".

Apple goes slightly closer to red (perhaps "red apples" are a bit more represented in the discourse). "gross" goes firmly to green. I tried "ewww" below to demonstrate that the words don't have to be real. If they mean something to a human, the lanuage model captures it.

Future Work

It would be particularly interesting if we could monitor the change over time. Below is a hypothetical depiction of what we might see if we were monitoring this over the last few decades. In principle, every single word or concept in the language is actively moving - the question is, where? If it's not moving, it's likely held together in places by opposing forces (like "capitalism" being tugged on by the left-wing and right-wing in opposite direction)

This kind of interface could be used to actively engineer culture. Let's say you believe there is a positive cultural change (as depicted above) that you want to accelerate. You could write essays/launch campaigns etc, and then check, before & after, if anything "moved the needle" (or if you accidentally moved an unrelated thing in an undesired direction). The image below is a hypothetical proposal for a campaign to steer culture in a particular direction.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
LICENSE		LICENSE
README.md		README.md
index.html		index.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Polarized Words

What do these distances mean?

How the code works

Future Work

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

License

DefenderOfBasic/good-and-evil-concepts

Folders and files

Latest commit

History

Repository files navigation

Polarized Words

What do these distances mean?

How the code works

Future Work

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Packages