-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ML/LlamaIndex: Add software tests and CI configuration #707
base: main
Are you sure you want to change the base?
Conversation
CRATEDB_URL="crate://<Database user name>:<Database password>@<Database host>:4200/?ssl=true" | ||
CRATEDB_SQLALCHEMY_URL="crate://<Database user name>:<Database password>@<Database host>:4200/?ssl=true" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CRATEDB_SQLALCHEMY_URL
is the designated environment variable name for the SQLAlchemy connection string, contrary to CRATEDB_HTTP_URL
, which is suitable for the Python DB API or crash
, for example.
In order to not use different environment variables on our tutorials and educational material, let's adjust this to adhere to that convention.
CREATE TABLE IF NOT EXISTS time_series_data ( | ||
timestamp TIMESTAMP, | ||
value DOUBLE, | ||
location STRING, | ||
sensor_id INT | ||
); | ||
|
||
INSERT INTO time_series_data (timestamp, value, location, sensor_id) | ||
VALUES | ||
('2023-09-14T00:00:00', 10.5, 'Sensor A', 1), | ||
('2023-09-14T01:00:00', 15.2, 'Sensor A', 1), | ||
('2023-09-14T02:00:00', 18.9, 'Sensor A', 1), | ||
('2023-09-14T03:00:00', 12.7, 'Sensor B', 2), | ||
('2023-09-14T04:00:00', 17.3, 'Sensor B', 2), | ||
('2023-09-14T05:00:00', 20.1, 'Sensor B', 2), | ||
('2023-09-14T06:00:00', 22.5, 'Sensor A', 1), | ||
('2023-09-14T07:00:00', 18.3, 'Sensor A', 1), | ||
('2023-09-14T08:00:00', 16.8, 'Sensor A', 1), | ||
('2023-09-14T09:00:00', 14.6, 'Sensor B', 2), | ||
('2023-09-14T10:00:00', 13.2, 'Sensor B', 2), | ||
('2023-09-14T11:00:00', 11.7, 'Sensor B', 2); | ||
|
||
REFRESH TABLE time_series_data; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new init.sql
file could also be used within the README to be provisioned by using an incantation to crash
.
langchain-openai<0.3 | ||
llama-index<0.12 | ||
llama-index-embeddings-langchain<0.3 | ||
llama-index-embeddings-openai<0.3 | ||
llama-index-llms-azure-openai<0.3 | ||
llama-index-llms-openai<0.3 | ||
sqlalchemy-cratedb |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After cleaning up dependencies, those are apparently all we need to define.
from cratedb_toolkit.io.sql import DatabaseAdapter | ||
|
||
HERE = Path(__file__).parent | ||
|
||
|
||
@pytest.fixture() | ||
def cratedb() -> DatabaseAdapter: | ||
return DatabaseAdapter(dburi="crate://crate@localhost:4200") | ||
|
||
|
||
@pytest.fixture(scope="function", autouse=True) | ||
def init_database(cratedb): | ||
""" | ||
Initialize database. | ||
""" | ||
cratedb.run_sql("DROP TABLE IF EXISTS time_series_data;") | ||
cratedb.run_sql((HERE / "init.sql").read_text()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@simonprickett: You have been looking at more high-level database interfaces to CrateDB, beyond what standard interfaces (DB API, xDBC, CLI) provide, right?
At least on the Python side, there are a few SDK-like layers in CTK, to be used in occasions like those, or others. They are intended to minimize lines of code needed to conduct certain repetitive procedures, by adding a few bells and whistles here and there. Up until now, the procedures are both originating from, and focusing on support for writing notebooks, demo snippets, and software tests.
Do you like that style of providing value-added details on top of the driver layer, i.e. does it resonate with your ideas in any way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from cratedb_toolkit.io.sql import DatabaseAdapter
CTK's DatabaseAdapter
is effectively a little wrapper around a few fundamentals, also using SQLAlchemy, in this spirit very similar to the SQLDatabase
wrappers of LangChain and LlamaIndex.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that LangChains also bears another layer on top of its SQLDatabase
wrapper, the SQLDatabaseLoader
.
That one is also pretty generic, so it can be counted in on the enumeration of re-usable high-level components to interact with CrateDB, mostly powered by SQLAlchemy in one way or another.
It has been coming from those contributions, broken out of crate-workbench/langchain#1:
87583d1
to
9a17ef1
Compare
About
A followup on GH-691.
References