Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Title : Run EvaDB on Postgres #922

Merged
merged 28 commits into from
Jul 15, 2023
Merged

Conversation

Chitti-Ankith
Copy link
Contributor

Summary : Made changes to provide an option that would run EVA on top of Postgres server instead of SQLite. Created a CircleCI Job to validate the changes. Currently the job is only run for Python 3.10, we can add more configurations if needed.

Also fixed the UTs that were failing on CircleCI runs in default master.

Test Plan : UT
Reviewers : @jarulraj @gaurav274

@Chitti-Ankith
Copy link
Contributor Author

if config_obj["experimental"]["use_postgres_backend"] is False:
return f"sqlite:///{evadb_dir.resolve()}/{DB_DEFAULT_NAME}"
else:
return f"postgresql://postgres:password@localhost:5432/{PG_DB_DEFAULT_NAME}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move this string to catalog_database_uri in evadb.yml?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added fields in the yml file to specify the username, password, host and port instead. This gives the clients a more flexible option to connect.

@@ -28,5 +28,6 @@
CACHE_DIR = "cache"
DATASET_DATAFRAME_NAME = "dataset"
DB_DEFAULT_NAME = "evadb.db"
PG_DB_DEFAULT_NAME = "circle_test"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is only for circleci testing, we can hardcode it.

Copy link
Contributor Author

@Chitti-Ankith Chitti-Ankith Jul 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

T̶̶̶h̶̶̶e̶̶̶r̶̶̶e̶̶̶ ̶i̶̶̶s̶̶̶ ̶s̶̶̶o̶̶̶m̶̶̶e̶̶̶ ̶i̶̶̶s̶̶̶s̶̶̶u̶̶̶e̶̶̶ ̶i̶̶̶n̶̶̶ ̶c̶̶̶i̶̶̶r̶̶̶c̶̶̶l̶̶̶e̶̶̶c̶̶̶i̶̶̶ ̶w̶̶̶h̶̶̶e̶̶̶r̶̶̶e̶̶̶i̶̶̶n̶̶̶ ̶t̶̶̶h̶̶̶e̶̶̶ ̶p̶̶̶o̶̶̶s̶̶̶t̶̶̶g̶̶̶r̶̶̶e̶̶̶s̶̶̶ ̶i̶̶̶m̶̶̶a̶̶̶g̶̶̶e̶̶̶ ̶p̶̶̶u̶̶̶l̶̶̶l̶̶̶e̶̶̶d̶̶̶ ̶i̶̶̶s̶̶̶ ̶n̶̶̶o̶̶̶t̶̶̶ ̶h̶̶̶o̶̶̶n̶̶̶o̶̶̶r̶̶̶i̶̶̶n̶̶̶g̶̶̶ ̶t̶̶̶h̶̶̶e̶̶̶ ̶e̶̶̶n̶̶̶v̶̶̶i̶̶̶r̶̶̶o̶̶̶n̶̶̶m̶̶̶e̶̶̶n̶̶̶t̶̶̶ ̶v̶̶̶a̶̶̶r̶̶̶i̶̶̶a̶̶̶b̶̶̶l̶̶̶e̶̶̶ ̶f̶̶̶o̶̶̶r̶̶̶ ̶d̶̶̶b̶̶̶_̶̶̶n̶̶̶a̶̶̶m̶̶̶e̶̶̶,̶ ̶a̶̶̶n̶̶̶d̶̶̶ ̶i̶̶̶s̶̶̶ ̶i̶̶̶n̶̶̶s̶̶̶t̶̶̶e̶̶̶a̶̶̶d̶̶̶ ̶g̶̶̶e̶̶̶t̶̶̶t̶̶̶i̶̶̶n̶̶̶g̶̶̶ ̶d̶̶̶e̶̶̶f̶̶̶a̶̶̶u̶̶̶l̶̶̶t̶̶̶e̶̶̶d̶̶̶ ̶t̶̶̶o̶̶̶ ̶'̶c̶̶̶i̶̶̶r̶̶̶c̶̶̶l̶̶̶e̶̶̶_̶̶̶t̶̶̶e̶̶̶s̶̶̶t̶̶̶'̶.̶ ̶H̶̶̶e̶̶̶n̶̶̶c̶̶̶e̶̶̶ ̶I̶̶̶ ̶n̶̶̶a̶̶̶m̶̶̶e̶̶̶d̶̶̶ ̶i̶̶̶t̶̶̶ ̶a̶̶̶s̶̶̶ ̶s̶̶̶u̶̶̶c̶̶̶h̶̶̶ ̶h̶̶̶e̶̶̶r̶̶̶e̶̶̶.̶

Copy link
Contributor Author

@Chitti-Ankith Chitti-Ankith Jul 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is already a bug open on this : CircleCI-Public/cimg-postgres#94

Update : The above bug only affects the pg cron job scheduler, the issue was from our end. I have updated the db default name to "evadb" now.

if config_obj["experimental"]["use_postgres_backend"] is False:
return f"sqlite:///{evadb_dir.resolve()}/{DB_DEFAULT_NAME}"
else:
return f"postgresql://postgres:password@localhost:5432/{PG_DB_DEFAULT_NAME}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above, we should allow users to specify an existing Postgres db server.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

setup.py Outdated
@@ -52,6 +52,7 @@ def read(path, encoding="utf-8"):
"aenum>=2.2.0",
"diskcache>=5.4.0",
"retry>=0.9.2",
"psycopg2",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let us not make it a default requirement.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -36,8 +36,8 @@ def init_db(engine: Engine):
if not database_exists(engine.url):
logger.info("Database does not exist, creating database.")
create_database(engine.url)
logger.info("Creating tables")
BaseModel.metadata.create_all(bind=engine)
logger.info("Creating tables")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just realized. We have function duplication here. We should be able to remove one instance.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

pg_password = config_obj["postgres"]["pg_password"]

if not pg_password:
return f"postgresql://{pg_user}@{pg_host}:{pg_port}/{PG_DB_DEFAULT_NAME}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think if we ask the user to provide this string directly instead of 4 parameters?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now, catalog_database_uri in config.yml is populated with sqlite string. I suggest we use it for postgres. Users can change it to Postgres.
Let me know if I'm missing some complexity with this approach. Thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Main concern was if users set 'use_postgres_backend' to true without setting the catalog_database_uri field, they will get an authentication issue since we don't have a default pg uri. However your suggested approach is much simpler, I have made the changes and added comments to ensure the field is set.

evadb/evadb.yml Outdated
ray: False
ray: False
# Set the postgres db uri to connect to in 'catalog_database_uri'
use_postgres_backend: False
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this flag? We can infer it directly from the catalog_database_uri. We can avoid asking users for an extra thing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, removed it.

@gaurav274
Copy link
Member

Overall LGTM! Fix the linter issue and address a minor comment. Thanks!

@Chitti-Ankith
Copy link
Contributor Author

Thanks for the review @gaurav274 !

@gaurav274 gaurav274 merged commit 81f889b into georgia-tech-db:master Jul 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants