Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for CrateDB to LangChain LLM framework #1

Draft
wants to merge 28 commits into
base: release-v0.3.4
Choose a base branch
from

Conversation

amotl
Copy link

@amotl amotl commented Sep 16, 2023

About

Discussing the patch to add support for CrateDB to LangChain, to be submitted upstream. Do not merge.

What's inside

Documentation

Notebooks

Backlog

/cc @matriv, @seut, @marijaselakovic, @karynzv: You may also want to have a review on it? Thanks!

@amotl amotl force-pushed the cratedb branch 5 times, most recently from 29cf863 to f75a3d7 Compare October 27, 2023 20:39
It is a special adapter which provides similarity search across multiple
collections. It can not be used for indexing documents.
The CrateDB adapter works a bit different compared to the pgvector
adapter it is building upon: Because the dimensionality of the vector
field needs to be specified at table creation time, but because it is
also a runtime parameter in LangChain, the table creation needs to be
delayed.

In some cases, the tables do not exist yet, but this is only relevant
for the case when the user requests to pre-delete the collection, using
the `pre_delete_collection` argument. So, do the error handling only
there instead, and _not_ on the generic data model utility functions.
…eddings

The performance gains can be substantially.
The test cases can be written substantially more elegant.
The CrateDB SQLAlchemy dialect needs more love, so it was separated from
the DBAPI HTTP driver.
Copy link

@kneth kneth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that I am not able to provide a meaningful review without a deeper understanding of LangChain.

Said that, I believe it is important work and it is great to see that we are getting closer to complete it.

Comment on lines 52 to 60
def embed_query(self, text: str) -> List[float]:
"""Return consistent embeddings for the text, if seen before, or a constant
one if the text is unknown."""
return self.embed_documents([text])[0]
if text not in self.known_texts:
return [float(1.0)] * (self.dimensionality - 1) + [float(0.0)]
return [float(1.0)] * (self.dimensionality - 1) + [
float(self.known_texts.index(text))
]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we create an issue to track the work?

@amotl amotl changed the base branch from cratedb-v0.3.4 to release-v0.3.4 October 25, 2024 17:39
@amotl
Copy link
Author

amotl commented Oct 29, 2024

I believe that I am not able to provide a meaningful review without a deeper understanding of LangChain.

At LangChain and CrateDB, we are maintaining a set of educational resources to learn a bit about their dynamic duo. That may spark your interest?

Copy link
Author

@amotl amotl Oct 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This document is the main entrypoint that reflects details about CrateDB on the LangChain documentation, in its major LangChain Providers section. Please review it correspondingly, and suggest relevant edits. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants