Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ML/LlamaIndex: Add software tests and CI configuration #707

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

amotl
Copy link
Member

@amotl amotl commented Nov 2, 2024

About

A followup on GH-691.

  • Adding software tests and a CI configuration, to verify the example program does not break.
  • Nightly runs will verify that the integration will always work, or signal when it breaks.
  • The outcome will be reflected on the Build Status page.
  • Dependabot will drive the project dependencies.

References

Comment on lines -75 to +84
CRATEDB_URL="crate://<Database user name>:<Database password>@<Database host>:4200/?ssl=true"
CRATEDB_SQLALCHEMY_URL="crate://<Database user name>:<Database password>@<Database host>:4200/?ssl=true"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CRATEDB_SQLALCHEMY_URL is the designated environment variable name for the SQLAlchemy connection string, contrary to CRATEDB_HTTP_URL, which is suitable for the Python DB API or crash, for example.

In order to not use different environment variables on our tutorials and educational material, let's adjust this to adhere to that convention.

Comment on lines +1 to +23
CREATE TABLE IF NOT EXISTS time_series_data (
timestamp TIMESTAMP,
value DOUBLE,
location STRING,
sensor_id INT
);

INSERT INTO time_series_data (timestamp, value, location, sensor_id)
VALUES
('2023-09-14T00:00:00', 10.5, 'Sensor A', 1),
('2023-09-14T01:00:00', 15.2, 'Sensor A', 1),
('2023-09-14T02:00:00', 18.9, 'Sensor A', 1),
('2023-09-14T03:00:00', 12.7, 'Sensor B', 2),
('2023-09-14T04:00:00', 17.3, 'Sensor B', 2),
('2023-09-14T05:00:00', 20.1, 'Sensor B', 2),
('2023-09-14T06:00:00', 22.5, 'Sensor A', 1),
('2023-09-14T07:00:00', 18.3, 'Sensor A', 1),
('2023-09-14T08:00:00', 16.8, 'Sensor A', 1),
('2023-09-14T09:00:00', 14.6, 'Sensor B', 2),
('2023-09-14T10:00:00', 13.2, 'Sensor B', 2),
('2023-09-14T11:00:00', 11.7, 'Sensor B', 2);

REFRESH TABLE time_series_data;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new init.sql file could also be used within the README to be provisioned by using an incantation to crash.

Comment on lines 1 to 7
langchain-openai<0.3
llama-index<0.12
llama-index-embeddings-langchain<0.3
llama-index-embeddings-openai<0.3
llama-index-llms-azure-openai<0.3
llama-index-llms-openai<0.3
sqlalchemy-cratedb
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After cleaning up dependencies, those are apparently all we need to define.

Comment on lines 5 to 22
from cratedb_toolkit.io.sql import DatabaseAdapter

HERE = Path(__file__).parent


@pytest.fixture()
def cratedb() -> DatabaseAdapter:
return DatabaseAdapter(dburi="crate://crate@localhost:4200")


@pytest.fixture(scope="function", autouse=True)
def init_database(cratedb):
"""
Initialize database.
"""
cratedb.run_sql("DROP TABLE IF EXISTS time_series_data;")
cratedb.run_sql((HERE / "init.sql").read_text())
Copy link
Member Author

@amotl amotl Nov 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@simonprickett: You have been looking at more high-level database interfaces to CrateDB, beyond what standard interfaces (DB API, xDBC, CLI) provide, right?

At least on the Python side, there are a few SDK-like layers in CTK, to be used in occasions like those, or others. They are intended to minimize lines of code needed to conduct certain repetitive procedures, by adding a few bells and whistles here and there. Up until now, the procedures are both originating from, and focusing on support for writing notebooks, demo snippets, and software tests.

Do you like that style of providing value-added details on top of the driver layer, i.e. does it resonate with your ideas in any way?

Copy link
Member Author

@amotl amotl Nov 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from cratedb_toolkit.io.sql import DatabaseAdapter

CTK's DatabaseAdapter is effectively a little wrapper around a few fundamentals, also using SQLAlchemy, in this spirit very similar to the SQLDatabase wrappers of LangChain and LlamaIndex.

Copy link
Member Author

@amotl amotl Nov 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that LangChains also bears another layer on top of its SQLDatabase wrapper, the SQLDatabaseLoader.

That one is also pretty generic, so it can be counted in on the enumeration of re-usable high-level components to interact with CrateDB, mostly powered by SQLAlchemy in one way or another.

It has been coming from those contributions, broken out of crate-workbench/langchain#1:

@amotl amotl force-pushed the ci-llama-index branch 3 times, most recently from 87583d1 to 9a17ef1 Compare November 2, 2024 16:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant