Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UserWarning: Relevance scores must be between 0 and 1 #19

Open
amotl opened this issue Nov 21, 2023 · 3 comments
Open

UserWarning: Relevance scores must be between 0 and 1 #19

amotl opened this issue Nov 21, 2023 · 3 comments
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@amotl
Copy link

amotl commented Nov 21, 2023

About

When running the test cases, there is a warning now. Most probably, it has been introduced by changing the style of the similarity search query through GH-15, which in turn changed the value range of the returned CrateDB-native _score values.

/path/to/langchain/libs/langchain/langchain/schema/vectorstore.py:313:
UserWarning: Relevance scores must be between 0 and 1, got 
[(Document(page_content='foo', metadata={'page': '0'}), 1.414213562373095), (Document(page_content='bar', metadata={'page': '1'}), 1.0606601717798212), (Document(page_content='baz', metadata={'page': '2'}), 0.8485281374238569)]

Evaluation

CrateDB's _score values are computed by CrateDB on behalf of different criteria of the input SQL query expression, execution plan, or actual execution. In this manner, they don't directly convey any useful information about the actual vector search similarity distance.

Suggestion

Use a corresponding function provided by CrateDB to compute the similarity distance independently of the CrateDB-native _score value.

/cc @ckurze, @seut, @matriv

@amotl
Copy link
Author

amotl commented Apr 11, 2024

This one might also be interesting, because it discusses potential similar [sic!] woes with other vector stores.

At least, it tells us that not every store is getting it right from the very beginning, wrt. what LangChain or other applications might want or expect.

@amotl
Copy link
Author

amotl commented Jul 17, 2024

Hi. CrateDB 5.8.0 added a vector_similarity() function, which may be applicable to improve the situation here. Thank you so much!

@amotl
Copy link
Author

amotl commented Oct 31, 2024

We are addressing this leftover issue on the upstream patch now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

1 participant