Skip to content

getting similar cosine similarity score for totally different queries  #14

@rangelrey

Description

@rangelrey

I will put an example to make it clearer:

I have a text like (I simplify it): "John Smith is a film maker" --> I create an embedding of this text and store it on Redis

My query is --> "Who is John Smith" --> The similarity score is 0.29

Then, I do another query (let's call it nonsense query)--> "Who is randomNameorText? --> The similarity score is 0.30 (higher is worse)

Even in some cases the nonsense query has a better (lower) score than my query.

I do not understand this behaviour. Why do random questions get better or similar scores than a legit question ?

Furthermore all nonsense questions get similarty sores close to 0.3. I have not seen any score of 0.4 or higher. I would have expected for totally out of context questions I would get scores of 0.9.

Technical information:
I am creating the embeddings with openai and storing them in redis.
I am using FLAT, type FLOAT64 and distance_metric cosine

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions