-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for CrateDB to LangChain LLM framework #1
base: release-v0.3.4
Are you sure you want to change the base?
Conversation
f8d8d49
to
85bf8c7
Compare
9fd7cd7
to
c139332
Compare
b35e066
to
b4289e6
Compare
Initial commit of rl_chain code
8a8fc4e
to
e3d07c4
Compare
docs/docs/modules/data_connection/document_loaders/sqlalchemy.mdx
Outdated
Show resolved
Hide resolved
libs/langchain/tests/integration_tests/document_loaders/test_sqlalchemy_cratedb.py
Outdated
Show resolved
Hide resolved
29cf863
to
f75a3d7
Compare
It is a special adapter which provides similarity search across multiple collections. It can not be used for indexing documents.
The CrateDB adapter works a bit different compared to the pgvector adapter it is building upon: Because the dimensionality of the vector field needs to be specified at table creation time, but because it is also a runtime parameter in LangChain, the table creation needs to be delayed. In some cases, the tables do not exist yet, but this is only relevant for the case when the user requests to pre-delete the collection, using the `pre_delete_collection` argument. So, do the error handling only there instead, and _not_ on the generic data model utility functions.
…eddings The performance gains can be substantially.
The test cases can be written substantially more elegant.
The CrateDB SQLAlchemy dialect needs more love, so it was separated from the DBAPI HTTP driver.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe that I am not able to provide a meaningful review without a deeper understanding of LangChain.
Said that, I believe it is important work and it is great to see that we are getting closer to complete it.
def embed_query(self, text: str) -> List[float]: | ||
"""Return consistent embeddings for the text, if seen before, or a constant | ||
one if the text is unknown.""" | ||
return self.embed_documents([text])[0] | ||
if text not in self.known_texts: | ||
return [float(1.0)] * (self.dimensionality - 1) + [float(0.0)] | ||
return [float(1.0)] * (self.dimensionality - 1) + [ | ||
float(self.known_texts.index(text)) | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we create an issue to track the work?
Those pages have been submitted to LangChain already.
At LangChain and CrateDB, we are maintaining a set of educational resources to learn a bit about their dynamic duo. That may spark your interest? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This document is the main entrypoint that reflects details about CrateDB on the LangChain documentation, in its major LangChain Providers section. Please review it correspondingly, and suggest relevant edits. Thanks!
About
Discussing the patch to add support for CrateDB to LangChain, to be submitted upstream. Do not merge.
What's inside
FLOAT_VECTOR
/KNN_MATCH
functionality through LangChain's vector store subsystem.Documentation
Notebooks
Backlog
FIXME
andTODO
remarks./cc @matriv, @seut, @marijaselakovic, @karynzv: You may also want to have a review on it? Thanks!