Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use SQLAlchemy for Rewriter #420

Open
joshua-oss opened this issue Nov 6, 2021 · 0 comments
Open

Use SQLAlchemy for Rewriter #420

joshua-oss opened this issue Nov 6, 2021 · 0 comments
Labels
2 - Medium 🍪 enhancement New feature or request

Comments

@joshua-oss
Copy link
Contributor

The rewriter currently generates the subquery using a Postgres-compatible AST, with the individual XXXSerializer classes responsible for converting the AST to the target engine's dialect. Although SQLAlchemy cannot map arbitrarily from one SQL text dialect to another (which is also a non-goal for us), SQLAlchemy can map from a universal Python code graph to any compatible SQL text dialect. Switching the rewriter to create a Python code graph and then using SQLAlchemy to serialize to SQL text would have some benefits:

  • More maintainable code for complex processing, such as sampling for quantiles.
  • Easier to support more engines. New dialects would only need to be covered in the grammar enough to be parsed into our AST to be used to drive the rewriter. Since we only support a subset of SQL-92 for differentially private processing, this is easier than trying to make adapters for everything the rewriter might need.
  • Can deprecate Serializer

We already have a dependency on SQLAlchemy for Pandas support, so this proposal introduces no new dependencies.

The current rewriter maps column names, including generated intermediate columns, safely across all of the subqueries. SQLAlchemy requires table metadata, similar to, but not the same as, our metadata. For the rewriter to work, we will need to implement a mapping from our metadata to SQLAlchemy's metadata, ensuring that both are compatible with the actual tables supplied in the connection.

A straightforward implementation would be to walk the parsed AST (from our supported and validated subset of SQL-92), and generate the appropriate SQLAlchemy Python statements. IOW, generate a single-purpose Python execution and use SQLAlchemy to serialize it to SQL text.

An alternate implementation would be to augment the AST to return SQLAlchemy functions, so any parsed AST could be used to generate SQL text in any other dialect. Then the rewriter would simply use a parsed AST, or AST fragments, to generate the subquery's SQL text for the target engine. This would allow the rewriter to take a more hybrid style, mixing and matching explicit Python execution graph and SQL AST.

@joshua-oss joshua-oss added enhancement New feature or request 2 - Medium 🍪 labels Nov 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2 - Medium 🍪 enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant