-
Notifications
You must be signed in to change notification settings - Fork 265
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Title : Run EvaDB on Postgres #922
Conversation
circleci
Circle CI link : https://app.circleci.com/pipelines/github/Chitti-Ankith/eva/19 |
if config_obj["experimental"]["use_postgres_backend"] is False: | ||
return f"sqlite:///{evadb_dir.resolve()}/{DB_DEFAULT_NAME}" | ||
else: | ||
return f"postgresql://postgres:password@localhost:5432/{PG_DB_DEFAULT_NAME}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we move this string to catalog_database_uri
in evadb.yml
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added fields in the yml file to specify the username, password, host and port instead. This gives the clients a more flexible option to connect.
evadb/configuration/constants.py
Outdated
@@ -28,5 +28,6 @@ | |||
CACHE_DIR = "cache" | |||
DATASET_DATAFRAME_NAME = "dataset" | |||
DB_DEFAULT_NAME = "evadb.db" | |||
PG_DB_DEFAULT_NAME = "circle_test" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is only for circleci
testing, we can hardcode it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
T̶̶̶h̶̶̶e̶̶̶r̶̶̶e̶̶̶ ̶i̶̶̶s̶̶̶ ̶s̶̶̶o̶̶̶m̶̶̶e̶̶̶ ̶i̶̶̶s̶̶̶s̶̶̶u̶̶̶e̶̶̶ ̶i̶̶̶n̶̶̶ ̶c̶̶̶i̶̶̶r̶̶̶c̶̶̶l̶̶̶e̶̶̶c̶̶̶i̶̶̶ ̶w̶̶̶h̶̶̶e̶̶̶r̶̶̶e̶̶̶i̶̶̶n̶̶̶ ̶t̶̶̶h̶̶̶e̶̶̶ ̶p̶̶̶o̶̶̶s̶̶̶t̶̶̶g̶̶̶r̶̶̶e̶̶̶s̶̶̶ ̶i̶̶̶m̶̶̶a̶̶̶g̶̶̶e̶̶̶ ̶p̶̶̶u̶̶̶l̶̶̶l̶̶̶e̶̶̶d̶̶̶ ̶i̶̶̶s̶̶̶ ̶n̶̶̶o̶̶̶t̶̶̶ ̶h̶̶̶o̶̶̶n̶̶̶o̶̶̶r̶̶̶i̶̶̶n̶̶̶g̶̶̶ ̶t̶̶̶h̶̶̶e̶̶̶ ̶e̶̶̶n̶̶̶v̶̶̶i̶̶̶r̶̶̶o̶̶̶n̶̶̶m̶̶̶e̶̶̶n̶̶̶t̶̶̶ ̶v̶̶̶a̶̶̶r̶̶̶i̶̶̶a̶̶̶b̶̶̶l̶̶̶e̶̶̶ ̶f̶̶̶o̶̶̶r̶̶̶ ̶d̶̶̶b̶̶̶_̶̶̶n̶̶̶a̶̶̶m̶̶̶e̶̶̶,̶ ̶a̶̶̶n̶̶̶d̶̶̶ ̶i̶̶̶s̶̶̶ ̶i̶̶̶n̶̶̶s̶̶̶t̶̶̶e̶̶̶a̶̶̶d̶̶̶ ̶g̶̶̶e̶̶̶t̶̶̶t̶̶̶i̶̶̶n̶̶̶g̶̶̶ ̶d̶̶̶e̶̶̶f̶̶̶a̶̶̶u̶̶̶l̶̶̶t̶̶̶e̶̶̶d̶̶̶ ̶t̶̶̶o̶̶̶ ̶'̶c̶̶̶i̶̶̶r̶̶̶c̶̶̶l̶̶̶e̶̶̶_̶̶̶t̶̶̶e̶̶̶s̶̶̶t̶̶̶'̶.̶ ̶H̶̶̶e̶̶̶n̶̶̶c̶̶̶e̶̶̶ ̶I̶̶̶ ̶n̶̶̶a̶̶̶m̶̶̶e̶̶̶d̶̶̶ ̶i̶̶̶t̶̶̶ ̶a̶̶̶s̶̶̶ ̶s̶̶̶u̶̶̶c̶̶̶h̶̶̶ ̶h̶̶̶e̶̶̶r̶̶̶e̶̶̶.̶
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is already a bug open on this : CircleCI-Public/cimg-postgres#94
Update : The above bug only affects the pg cron job scheduler, the issue was from our end. I have updated the db default name to "evadb" now.
evadb/database.py
Outdated
if config_obj["experimental"]["use_postgres_backend"] is False: | ||
return f"sqlite:///{evadb_dir.resolve()}/{DB_DEFAULT_NAME}" | ||
else: | ||
return f"postgresql://postgres:password@localhost:5432/{PG_DB_DEFAULT_NAME}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above, we should allow users to specify an existing Postgres db server.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
setup.py
Outdated
@@ -52,6 +52,7 @@ def read(path, encoding="utf-8"): | |||
"aenum>=2.2.0", | |||
"diskcache>=5.4.0", | |||
"retry>=0.9.2", | |||
"psycopg2", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let us not make it a default requirement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -36,8 +36,8 @@ def init_db(engine: Engine): | |||
if not database_exists(engine.url): | |||
logger.info("Database does not exist, creating database.") | |||
create_database(engine.url) | |||
logger.info("Creating tables") | |||
BaseModel.metadata.create_all(bind=engine) | |||
logger.info("Creating tables") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just realized. We have function duplication here. We should be able to remove one instance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
pg_password = config_obj["postgres"]["pg_password"] | ||
|
||
if not pg_password: | ||
return f"postgresql://{pg_user}@{pg_host}:{pg_port}/{PG_DB_DEFAULT_NAME}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think if we ask the user to provide this string directly instead of 4 parameters?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now, catalog_database_uri
in config.yml
is populated with sqlite
string. I suggest we use it for postgres. Users can change it to Postgres.
Let me know if I'm missing some complexity with this approach. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Main concern was if users set 'use_postgres_backend' to true without setting the catalog_database_uri field, they will get an authentication issue since we don't have a default pg uri. However your suggested approach is much simpler, I have made the changes and added comments to ensure the field is set.
evadb/evadb.yml
Outdated
ray: False | ||
ray: False | ||
# Set the postgres db uri to connect to in 'catalog_database_uri' | ||
use_postgres_backend: False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this flag? We can infer it directly from the catalog_database_uri
. We can avoid asking users for an extra thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, removed it.
Overall LGTM! Fix the linter issue and address a minor comment. Thanks! |
Thanks for the review @gaurav274 ! |
Summary : Made changes to provide an option that would run EVA on top of Postgres server instead of SQLite. Created a CircleCI Job to validate the changes. Currently the job is only run for Python 3.10, we can add more configurations if needed.
Also fixed the UTs that were failing on CircleCI runs in default master.
Test Plan : UT
Reviewers : @jarulraj @gaurav274