getting similar cosine similarity score for totally different queries 

I will put an example to make it clearer:

I have a text like (I simplify it): "John Smith is a film maker" --> I create an embedding of this text and store it on Redis

My query is --> "Who is John Smith" --> The similarity score is 0.29

Then, I do another query (let's call it nonsense query)--> "Who is randomNameorText? --> The similarity score is 0.30 (higher is worse)

Even in some cases the nonsense query has a better (lower) score than my query.

I do not understand this behaviour. Why do random questions get better or similar scores than a legit question ?

Furthermore all nonsense questions get similarty sores close to 0.3. I have not seen any score of 0.4 or higher. I would have expected for totally out of context questions I would get scores of 0.9.

Technical information:
I am creating the embeddings with openai and storing them in redis.
I am using FLAT, type FLOAT64 and distance_metric cosine

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

getting similar cosine similarity score for totally different queries #14

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

getting similar cosine similarity score for totally different queries #14

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions