-
Notifications
You must be signed in to change notification settings - Fork 26
Description
I will put an example to make it clearer:
I have a text like (I simplify it): "John Smith is a film maker" --> I create an embedding of this text and store it on Redis
My query is --> "Who is John Smith" --> The similarity score is 0.29
Then, I do another query (let's call it nonsense query)--> "Who is randomNameorText? --> The similarity score is 0.30 (higher is worse)
Even in some cases the nonsense query has a better (lower) score than my query.
I do not understand this behaviour. Why do random questions get better or similar scores than a legit question ?
Furthermore all nonsense questions get similarty sores close to 0.3. I have not seen any score of 0.4 or higher. I would have expected for totally out of context questions I would get scores of 0.9.
Technical information:
I am creating the embeddings with openai and storing them in redis.
I am using FLAT, type FLOAT64 and distance_metric cosine