Skip to content

fix(sqlalchemy): do not emit CREATE SCHEMA on Oracle (#3939)#4001

Open
DRACULA1729 wants to merge 1 commit into
dlt-hub:develfrom
DRACULA1729:fix/oracle-sqlalchemy-create-schema
Open

fix(sqlalchemy): do not emit CREATE SCHEMA on Oracle (#3939)#4001
DRACULA1729 wants to merge 1 commit into
dlt-hub:develfrom
DRACULA1729:fix/oracle-sqlalchemy-create-schema

Conversation

@DRACULA1729

Copy link
Copy Markdown

Description

Loading to Oracle through the sqlalchemy destination fails on the first run with ORA-02420: missing schema authorization clause. dlt runs CREATE SCHEMA "<dataset>" when it cannot find the target dataset, and Oracle rejects that. In Oracle a schema is a database user, so there is no bare CREATE SCHEMA (it needs CREATE SCHEMA AUTHORIZATION <user> ...).

This adds dataset-lifecycle extension points to DialectCapabilities and routes the SQL client through them. The base implementations keep the old behavior for every other dialect, so nothing changes for Postgres/MySQL/etc.

OracleDialectCapabilities overrides them:

  • dataset_exists matches case-insensitively, since Oracle folds unquoted identifiers to upper case. Loading into an existing schema is now detected and no creation is attempted.
  • create_dataset no longer emits CREATE SCHEMA. If the schema is missing it raises a terminal error explaining that the schema (and the <dataset>_staging schema, for merge/replace) has to be created up front as a user with grants, instead of surfacing the cryptic ORA-02420.
  • drop_dataset drops the tables inside the schema instead of DROP SCHEMA, which on Oracle would require DROP USER (a DBA privilege).

Related Issues

Additional Context

  • Follows the maintainer guidance in Pipeline state tables are not created when using Oracle via SQLAlchemy as a destination #3141: staging features are not hard-restricted; if the user pre-creates the <dataset>_staging schema, merge/replace still work.
  • Oracle is not in the sqlalchemy-destination CI matrix, so the new tests are unit-level (mock-based) in tests/load/sqlalchemy/test_sqlalchemy_dialect.py and run without a live Oracle.
  • Docs updated: the Oracle limitations section and the dialect-capabilities extension-points table in sqlalchemy.md.
  • Ran locally: the new dialect tests pass; ruff, mypy, flake8 and black are clean on the changed files.

Oracle schemas are owned by database users and cannot be created with a
bare `CREATE SCHEMA` statement, so the sqlalchemy destination failed with
`ORA-02420: missing schema authorization clause` when initializing storage.

Add dataset-lifecycle extension points (`dataset_exists`, `create_dataset`,
`drop_dataset`) to `DialectCapabilities` and delegate to them from the SQL
client. The base implementations preserve the previous behavior for all
other dialects. `OracleDialectCapabilities` overrides them to:

- match schema existence case-insensitively (Oracle folds identifiers to
  upper case), so loading into an existing schema is detected and no
  creation is attempted
- skip `CREATE SCHEMA`; raise a clear terminal error when the target schema
  (or `<dataset>_staging`) does not exist, instead of the cryptic ORA-02420
- drop the tables within the schema instead of `DROP SCHEMA` (which would
  require `DROP USER`, a DBA privilege)

Add unit tests for the new hooks and document Oracle's existing-schema
requirement in the destination docs.
@DRACULA1729 DRACULA1729 force-pushed the fix/oracle-sqlalchemy-create-schema branch from ef3ee3a to 31bcc80 Compare May 29, 2026 21:30
@DRACULA1729

DRACULA1729 commented May 29, 2026

Copy link
Copy Markdown
Author

This is ready for review. The fork-gated workflows (the ones that need repo secrets) are stuck in the "requires reviewer approval" state, so they need a maintainer to approve the run before they execute.

@ivasio you dug into this exact Oracle case in #3141, so it's probably familiar territory. @rudolfix it extends the DialectCapabilities system from #3600 with dataset-lifecycle hooks. Happy to rebase or adjust anything.

@rudolfix

rudolfix commented Jun 5, 2026

Copy link
Copy Markdown
Collaborator

@DRACULA1729 were you able to run data loading correctly with this fix? oracle CI setup is still not ready on our side but my quick tests revealed a few further problems with type mapper, case folding etc. I'd also allow dlt to create and drop users, otherwise our test suites will never pass. that could be disabled in production tough

@rudolfix rudolfix self-assigned this Jun 5, 2026
@DRACULA1729

Copy link
Copy Markdown
Author

Spun up Oracle 23 free-lite locally and dug into it. Here's what I found.

Loading into an existing schema (the user's own) works with the current fix: data loads and reads back fine. The breakage shows up on the second run, with ORA-22848: cannot use CLOB type as comparison key. dlt's text columns (pipeline_name, version_hash) map to CLOB, and get_stored_state compares them in WHERE/JOIN. That's the type-mapper problem you hit.

The tricky bit: the short key columns want VARCHAR2, but schema is already ~3.7KB for a single-table pipeline and state can grow large, so those have to stay CLOB. dlt's text type doesn't distinguish a short key from a large blob, so I wanted your call on the direction:
(a) give the internal key columns an explicit precision in the common schema (cleanest, but it touches every destination and changes the schema hash),
(b) keep an Oracle-local mapper that sends unbounded text to VARCHAR2(4000) and only the known large internal columns to CLOB, or
(c) override the state/schema lookups on Oracle so the CLOB comparison is done safely.
Which way would you go?

On case folding: the factory sets casefold_identifier = str.lower for Oracle, but Oracle folds unquoted identifiers to upper, so it should be str.upper for interop with existing tables. Easy fix.

On create/drop users: agreed it's needed for the suite to pass. To make it line up with your CI, will it connect to a PDB (so dataset-named users are valid without the C## prefix) and grant the loader CREATE USER/DROP USER? And what password and tablespace should created users get? Happy to implement once I know the shape you're aiming for so it doesn't clash with the CI you're setting up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Oracle 12c ORA-02420: missing schema authorization clause

2 participants