Add support for CrateDB to LangChain LLM framework #1

amotl · 2023-09-16T18:10:01Z

About

Discussing the patch to add support for CrateDB to LangChain, to be submitted upstream. Do not merge.

What's inside

Support for CrateDB's FLOAT_VECTOR / KNN_MATCH functionality through LangChain's vector store subsystem.
Support for loading documents from CrateDB through LangChain's document loader subsystem.
Support for managing "chat" history using LangChain's conversational memory subsystem.

Documentation

integrations/providers/cratedb.mdx

Notebooks

Backlog

See comments below.
Break out SQLDatabaseLoader documentation into separate patch.
Review all FIXME and TODO remarks.
Bring documentation up to speed. Have a look at those blueprints:

/cc @matriv, @seut, @marijaselakovic, @karynzv: You may also want to have a review on it? Thanks!

Initial commit of rl_chain code

docs/docs/integrations/document_loaders/cratedb.ipynb

docs/docs/integrations/document_loaders/sqlalchemy.ipynb

docs/docs/integrations/providers/cratedb.mdx

docs/docs/modules/data_connection/document_loaders/sqlalchemy.mdx

docs/docs/integrations/providers/cratedb.mdx

libs/langchain/tests/integration_tests/document_loaders/test_sqlalchemy_cratedb.py

libs/langchain/langchain/vectorstores/cratedb/base.py

…eddings The performance gains can be substantially.

The test cases can be written substantially more elegant.

…ader

The CrateDB SQLAlchemy dialect needs more love, so it was separated from the DBAPI HTTP driver.

kneth

I believe that I am not able to provide a meaningful review without a deeper understanding of LangChain.

Said that, I believe it is important work and it is great to see that we are getting closer to complete it.

kneth · 2024-10-25T09:50:13Z

libs/langchain/tests/integration_tests/cache/fake_embeddings.py

    def embed_query(self, text: str) -> List[float]:
        """Return consistent embeddings for the text, if seen before, or a constant
        one if the text is unknown."""
-        return self.embed_documents([text])[0]
+        if text not in self.known_texts:
+            return [float(1.0)] * (self.dimensionality - 1) + [float(0.0)]
+        return [float(1.0)] * (self.dimensionality - 1) + [
+            float(self.known_texts.index(text))
+        ]


Can we create an issue to track the work?

Those pages have been submitted to LangChain already.

amotl · 2024-10-29T08:34:50Z

I believe that I am not able to provide a meaningful review without a deeper understanding of LangChain.

At LangChain and CrateDB, we are maintaining a set of educational resources to learn a bit about their dynamic duo. That may spark your interest?

amotl · 2024-10-29T11:53:49Z

docs/docs/integrations/providers/cratedb.mdx

This document is the main entrypoint that reflects details about CrateDB on the LangChain documentation, in its major LangChain Providers section. Please review it correspondingly, and suggest relevant edits. Thanks!

amotl · 2024-12-24T15:06:40Z

Hi.

I believe it is important work and it is great to see that we are getting closer to complete it.

The fundamental implementation (vector store, document loader, chat history) as implemented within this genesis patch has been converged into a dedicated package langchain-cratedb. 🎄

PyPI: https://pypi.org/project/langchain-cratedb/
Repository: https://github.com/crate/langchain-cratedb/
Documentation (local): https://cratedb.com/docs/guide/integrate/langchain/
Documentation (remote): https://python.langchain.com/docs/integrations/providers/cratedb/
Examples: https://github.com/crate/cratedb-examples/tree/main/topic/machine-learning/llm-langchain

We will follow up on this fundamental implementation and add new features and documentation as we go. This PR and feature branch will be closed/deleted in January 2025.

With kind regards,
Andreas.

amotl force-pushed the cratedb branch from 466775a to 99c0a1f Compare September 16, 2023 18:14

amotl mentioned this pull request Jun 16, 2024

SQLAlchemy: Polyfill for transparently synchronizing data with REFRESH TABLE crate/sqlalchemy-cratedb#83

Open

amotl force-pushed the cratedb branch 4 times, most recently from f8d8d49 to 85bf8c7 Compare September 16, 2023 22:37

amotl mentioned this pull request Sep 18, 2023

SQLAlchemy: Polyfill for AUTOINCREMENT columns crate/sqlalchemy-cratedb#77

Open

amotl force-pushed the cratedb branch 3 times, most recently from 9fd7cd7 to c139332 Compare September 17, 2023 22:01

amotl mentioned this pull request Sep 17, 2023

[LangChain] Add example programs and notebooks crate/cratedb-examples#85

Merged

1 task

amotl force-pushed the cratedb branch 2 times, most recently from b35e066 to b4289e6 Compare September 18, 2023 11:09

amotl force-pushed the cratedb branch from b4289e6 to 1843ce4 Compare September 25, 2023 20:08

amotl force-pushed the cratedb branch from 083a1e6 to 9d968fe Compare October 3, 2023 21:22

amotl mentioned this pull request Oct 10, 2023

Contrib: Add a few SQLAlchemy patches and polyfills crate/cratedb-toolkit#59

Merged

amotl pushed a commit that referenced this pull request Oct 11, 2023

Merge pull request #1 from VowpalWabbit/add_rl_chain

e942330

Initial commit of rl_chain code

amotl force-pushed the cratedb branch 7 times, most recently from 8a8fc4e to e3d07c4 Compare October 17, 2023 14:58

amotl commented Oct 19, 2023

View reviewed changes

amotl force-pushed the cratedb branch 5 times, most recently from 29cf863 to f75a3d7 Compare October 27, 2023 20:39

amotl added 11 commits October 25, 2024 07:27

CrateDB vector: Use SA's bulk_save_objects method for inserting emb…

53aee67

…eddings The performance gains can be substantially.

CrateDB vector: Test non-deterministic values by using pytest.approx

70685ce

The test cases can be written substantially more elegant.

CrateDB vector: Fix initialization of vector dimensionality

ccd2a25

CrateDB: Refactor to langchain_community

800ace6

CrateDB vector: Adjustments for updates to pgvector adapter

b40c24f

CrateDB vector: Relax test constraint

cb06a66

CrateDB loader: SQLAlchemyLoader has been superseded by SQLDatabaseLo…

fa28b24

…ader

CrateDB: Migrate from crate[sqlalchemy] to sqlalchemy-cratedb

41ccacf

The CrateDB SQLAlchemy dialect needs more love, so it was separated from the DBAPI HTTP driver.

CrateDB: Stop using CrateDB Toolkit

3bc63a8

CrateDB: Stop using local FloatVector implementation

c561a95

CrateDB: Format code. Satisfy linter and type checker. ruff + mypy

8b278a8

amotl force-pushed the cratedb branch from def8c35 to 8b278a8 Compare October 25, 2024 05:28

kneth reviewed Oct 25, 2024

View reviewed changes

amotl changed the base branch from cratedb-v0.3.4 to release-v0.3.4 October 25, 2024 17:39

amotl mentioned this pull request Oct 28, 2024

Review: Integration Tests » Fake Embeddings » ConsistentFakeEmbeddings.embed_query #28

Closed

amotl added 2 commits October 28, 2024 21:52

CrateDB: Remove adjustment to ConsistentFakeEmbeddings in langchain-core

41f6462

CrateDB: Refactor leftovers from langchain-core to langchain-community

19a09ab

amotl mentioned this pull request Oct 28, 2024

docs: Add how-to guide for SQLDatabaseLoader langchain-ai/langchain#27696

Closed

amotl added 2 commits October 29, 2024 00:19

CrateDB: Remove documentation about SQLDatabaseLoader

91da770

Those pages have been submitted to LangChain already.

CrateDB: Remove leftovers in langchain-core

1faedfe

amotl commented Oct 29, 2024

View reviewed changes

This was referenced Dec 15, 2024

Genesis: Add support for CrateDB crate/langchain-cratedb#1

Merged

LLM: Update to langchain-cratedb 0.0.0 crate/cratedb-examples#773

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add support for CrateDB to LangChain LLM framework #1

Add support for CrateDB to LangChain LLM framework #1

Uh oh!

amotl commented Sep 16, 2023 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kneth left a comment

Uh oh!

kneth Oct 25, 2024

Uh oh!

amotl commented Oct 29, 2024

Uh oh!

amotl Oct 29, 2024 •

edited

Loading

Uh oh!

amotl commented Dec 24, 2024 •

edited

Loading

Uh oh!

Uh oh!

Add support for CrateDB to LangChain LLM framework #1

Are you sure you want to change the base?

Add support for CrateDB to LangChain LLM framework #1

Uh oh!

Conversation

amotl commented Sep 16, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

About

What's inside

Documentation

Notebooks

Backlog

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kneth left a comment

Choose a reason for hiding this comment

Uh oh!

kneth Oct 25, 2024

Choose a reason for hiding this comment

Uh oh!

amotl commented Oct 29, 2024

Uh oh!

amotl Oct 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

amotl commented Dec 24, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

amotl commented Sep 16, 2023 •

edited

Loading

amotl Oct 29, 2024 •

edited

Loading

amotl commented Dec 24, 2024 •

edited

Loading