Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove the mixture of async endpoints with a sync db connection #901

Merged
merged 1 commit into from
Jan 26, 2024

Conversation

shangyian
Copy link
Contributor

Summary

We should not mix async API endpoints with a synchronous db connection (psycopg3). Because our database client does not support async, every database call will introduce blocking calls to the event loop, eventually resulting in the uvicorn workers timing out.

The issue is very similar to the one described here (in their case it was sync API call vs sync database query): tiangolo/uvicorn-gunicorn-fastapi-docker#145 (comment)

Test Plan

N/A

  • PR has an associated issue: #
  • make check passes
  • make test shows 100% unit test coverage

Deployment Plan

Ideally we deploy this right away. Otherwise over time the deployed DJ servers will experience more and more blocking calls, gradually causing workers to time out.

… causes issues as our database client not support async, so every database call introduces blocking calls to the event loop, eventually resulting in the workers timing out.
Copy link

netlify bot commented Jan 26, 2024

Deploy Preview for thriving-cassata-78ae72 canceled.

Name Link
🔨 Latest commit 1245f28
🔍 Latest deploy log https://app.netlify.com/sites/thriving-cassata-78ae72/deploys/65b2fadc1cf89c000842b1a2

@shangyian shangyian marked this pull request as ready for review January 26, 2024 00:20
Copy link
Member

@agorajek agorajek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @shangyian for catching this. @betodealmeida have you seen something like this before?

@CircArgs
Copy link
Contributor

@shangyian seeing as the bottleneck has been the orm and db calls, should DJ move to async engines and sessions?

@samredai and I were working on something and came out to approximately:

def get_engine() -> AsyncEngine:
    """
    Create the metadata engine.
    """
    ...

    def getconn():
        conn = connector.connect(instance_connection_name, "pg8000", user=db_user, password=db_pass, db=db_name)
        return conn

    engine = sqlalchemy.create_engine("postgresql+pg8000://", creator=getconn, future=True)
    return engine

as opposed to

def get_engine() -> Engine:

@shangyian
Copy link
Contributor Author

shangyian commented Jan 26, 2024

@CircArgs yeah, I was investigating this in December and started working on a PR for it, but the necessary changes are actually huge: main...shangyian:dj:use-asyncpg

The part where we just create an async engine is pretty easy, i.e., main...shangyian:dj:use-asyncpg#diff-5ebc477598017d1697e464ee79c28d2cdd36c55b2b904a358a8e5918ab4b2109R13.

However, we also have to prevent implicit io which SQLAlchemy does whenever we dynamically pull in different objects on a given model: sqlalchemy/sqlalchemy#6165 (for example if we call node.parents). Accounting for that across all our endpoints is... not that easy. It's definitely not impossible though and I think I got to a point where 50% of tests were passing. If you want to take a stab at it, go for it!

Also fwiw I don't think bottleneck was database calls, but rather database calls that block the eventloop, which only happens when we issue sync calls from an async endpoint. We might eventually get to a point where database throughput is a problem, but we're not there yet (at least internally at Netflix, that is).

@CircArgs
Copy link
Contributor

@shangyian hmm sqlalchemy strikes again

necessary changes are actually huge
...
if you want to take a stab at it, go for it!

😨

@shangyian
Copy link
Contributor Author

hmm sqlalchemy strikes again

@CircArgs yeah it's probably also that we weren't using it in the most efficient way... but nonetheless, here we are :(

😨

Yeaaaah unfun

@shangyian shangyian merged commit c79b275 into DataJunction:main Jan 26, 2024
24 checks passed
@shangyian shangyian deleted the remove-sync-async-mix branch January 26, 2024 01:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants