Skip to content

fix(databricks): lazily create memtable volume at first use#11962

Open
NickCrews wants to merge 1 commit intoibis-project:mainfrom
NickCrews:databricks-lazy-memtables
Open

fix(databricks): lazily create memtable volume at first use#11962
NickCrews wants to merge 1 commit intoibis-project:mainfrom
NickCrews:databricks-lazy-memtables

Conversation

@NickCrews
Copy link
Contributor

Fixes #11598

Previously, the memtable volume was created eagerly in _post_connect() immediately after connecting, which caused read-only users to fail with a PermissionDenied error even when they never intended to use memtables.

This change defers volume creation until the first memtable is actually registered via _register_in_memory_table(), allowing read-only users to connect and query existing data without requiring CREATE VOLUME privilege.

This is a redo of #11956,
where I take a more complete approach:

  • move all logic into a MemtablesManager class. I didn't like how the state was spread throughout the rest of the Backend, it was leaking everywhere. Now I think it is a lot more clear who is responsible for what
  • This fixed a bug where in .do_connect() we set _memtable_catalog and _memtable_database, but in .from_connection() we did not. I'm pretty sure and backend instance created with .from_connection would be in an invalid state and wouldn't have been able to use memtables at all. By centralizing the memtable management into a class, this also reduces the chance for this sort of bug where backends can get created in multiple ways.
  • This changes how the memtable_volume param is interpreted. Before, it was interpreted as just the suffix to a larger path that was generated, eg if you passed foo then the memtables would actually be created at /Volumes/the_current_catalog_at_init_time/the_current_database_at_init_time/foo. After this change, you just get the plain foo, so the called has much more control over what they want the volume to be.

I didn't add tests because I couldn't find a way to easily create a readonly connection. Perhaps we could create a new role in our databricks account that we use for testing? But mostly I treat this as a refactor, so if existing tests don't begin failing, that's good enough for me.

@github-actions github-actions bot added the databricks The Databricks backend label Mar 5, 2026
Fixes ibis-project#11598

Previously, the memtable volume was created eagerly in _post_connect()
immediately after connecting, which caused read-only users to fail with
a PermissionDenied error even when they never intended to use memtables.

This change defers volume creation until the first memtable is actually
registered via _register_in_memory_table(), allowing read-only users to
connect and query existing data without requiring CREATE VOLUME privilege.
@NickCrews NickCrews force-pushed the databricks-lazy-memtables branch from 42b386f to 4429f4c Compare March 5, 2026 20:54
@NickCrews NickCrews changed the title fix(databricks): Lazily create memtable volume at first use fix(databricks): lazily create memtable volume at first use Mar 5, 2026
@NickCrews
Copy link
Contributor Author

@VENKATASAI1994 what do you think of this alternate formulation?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

databricks The Databricks backend

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug: Unable to connect to Databrick with Read only roles

1 participant