Implement databricks creds manager #2123

m-abulazm · 2025-10-28T13:13:00Z

Changes

What does this PR do?

Support databricks credentials
Standardize usages of credentials manager

Linked issues

Progresses #1008

Functionality

added relevant user documentation
modified existing command: databricks labs lakebridge ...

Tests

manually tested
added unit tests
added integration tests

codecov · 2025-10-28T13:14:52Z

Codecov Report

❌ Patch coverage is 79.03930% with 48 lines in your changes missing coverage. Please review.
✅ Project coverage is 64.29%. Comparing base (b26e72f) to head (e3033ff).
⚠️ Report is 3 commits behind head on main.

Files with missing lines	Patch %	Lines
.../labs/lakebridge/connections/credential_manager.py	79.48%	8 Missing ⚠️
src/databricks/labs/lakebridge/reconcile/utils.py	0.00%	6 Missing ⚠️
src/databricks/labs/lakebridge/cli.py	66.66%	2 Missing and 2 partials ⚠️
.../labs/lakebridge/reconcile/connectors/snowflake.py	84.61%	2 Missing and 2 partials ⚠️
src/databricks/labs/lakebridge/deployment/job.py	70.00%	2 Missing and 1 partial ⚠️
...s/assessments/synapse/dedicated_sqlpool_extract.py	0.00%	3 Missing ⚠️
.../assessments/synapse/monitoring_metrics_extract.py	0.00%	3 Missing ⚠️
.../assessments/synapse/serverless_sqlpool_extract.py	0.00%	3 Missing ⚠️
...resources/assessments/synapse/workspace_extract.py	0.00%	3 Missing ⚠️
...abs/lakebridge/assessments/configure_assessment.py	33.33%	2 Missing ⚠️
... and 6 more

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2123      +/-   ##
==========================================
+ Coverage   63.96%   64.29%   +0.32%     
==========================================
  Files         101       99       -2     
  Lines        8651     8696      +45     
  Branches      894      909      +15     
==========================================
+ Hits         5534     5591      +57     
+ Misses       2947     2930      -17     
- Partials      170      175       +5

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

github-actions · 2025-10-28T13:19:03Z

✅ 51/51 passed, 5 flaky, 4m2s total

Flaky tests:

🤪 test_transpiles_informatica_to_sparksql (21.525s)
🤪 test_transpile_teradata_sql_non_interactive[False] (21.226s)
🤪 test_transpile_teradata_sql (23.082s)
🤪 test_transpiles_informatica_to_sparksql_non_interactive[False] (4.211s)
🤪 test_transpile_teradata_sql_non_interactive[True] (5.613s)

_{Running from acceptance #3337}

# Conflicts: # src/databricks/labs/lakebridge/reconcile/connectors/jdbc_reader.py # src/databricks/labs/lakebridge/reconcile/connectors/oracle.py

…e plus major cleanup

… in one test for it to be green

m-abulazm · 2025-12-18T12:01:41Z

@asnare @sundarshankar89

The goal of the dict is to contain the connection properties, for example:

{
  "user": "scope/key",
  "password": "scope/key2",
  "host": "scope3/key4",
  "port": "scope4/key5",
  "database": "scope/key3",
  "encrypt": "scope/key",
  "trustServerCertificate": "scope/key"
}

i.e. mapping logical connection options to (scope, key) references so that, at runtime, we only read those specific secrets via the vault, not entire scopes. The use of the __secret_scope key is just to support older configs that only specified a single scope and didn’t use this more dynamic mapping.

In other words, my intent in this PR was always to model the connection properties / OPTIONS(...) side (N) and how they reference secrets, not to introduce a general vault configuration model. The stacked PRs already use this dict in the connection resolution path; the vault backend shape (M) can still be tightened separately if we want.

…#2159)  ## Changes  ### What does this PR do? * Move away from hardcoded secrets in reconcile * use credential manager which enables local, env and databricks ### Relevant implementation details * add `load_credentials` to `DataSource` which takes care of loading the credentials ### Caveats/things to watch out for when reviewing: ### Linked issues  Progresses #1008, #2123, #2157 ### Functionality - [ ] added relevant user documentation - [ ] added new CLI command - [X] modified existing command: `databricks labs lakebridge reconcile` - [ ] ... +add your own ### Tests  - [ ] manually tested - [X] added unit tests - [X] added integration tests --------- Co-authored-by: Guenia Izquierdo <[email protected]>

…n tests

# Conflicts: # src/databricks/labs/lakebridge/reconcile/connectors/source_adapter.py

asnare

This PR has been quite difficult to review, it now spans 41 files with >1.5K LoC to review. The goals, as stated in the summary are:

Implement support for Databricks-hosted secrets.
Standardize the way the secrets are retrieved for each type of source connection.
Allow the keys used to look up secrets to be configured by the user, as per #1008.

To achieve this last goal, a major user-facing change has been introduced in the configuration format that users have to use. Before this PR we had:

---
data_source: snowflake
secret_scope: remorph_snowflake
# ...

As things currently stand this has become:

---
data_source: snowflake
creds:
  vault_type: databricks
  vault_secret_names:
    sfUser: remorph_snowflake/sfUser
    sfPassword: remorph_snowflake/sfPassword  # Maybe!
    sfUrl: remorph_snowflake/sfUrl
    sfDatabase: remorph_snowflake/sfDatabase
    sfSchema: remorph_snowflake/sfSchema
    sfWarehouse: remorph_snowflake/sfWarehouse
    sfRole: remorph_snowflake/sfRole
# ...

Of the changes in this PR, this is actually the most important to get right because it's user-facing and:

We need to document it;
We need to support it;
Changing it is very very difficult (as this PR is evidence of).

From a usability perspective, the big issue is that all the keys are required, and the set is different depending the source.

As things stand there are also some implementation issues:

Validation is happening on lazily when the credentials are accessed rather than during initialisation/load.
A Snowflake user with the old format ends up with a broken one due to migration. The migration runs in-memory on every load, and we generate some keys (sfPassword & pem_private_key). We then immediately fail by telling the user they need to manually remove one. However that migrated file hasn't been written back, so the user won't be able to see this and will end up confused.
For Snowflake it looks like some keys are half-implemented, in particular the things for authenticating with a private key instead of a password.

Part of the issue here is that these changes were never really designed up front from a user-centric perspective, but rather evolved during the implementation. With that in mind, let's specify a design:

We're introducing a new version of the configuration file, version 2.
A mandatory new creds field that is a mapping containing:
- vault_type: a mandatory string where the only allowed value right now is: databricks.
- secret_scope: an optional string, replacing the top-level field. Although marked as optional, it becomes mandatory if:
  - There is no vault_secret_names field (below); or
  - Any of the values in the vault_secret_names are simple keys without a scope; or
  - A default secret name is needed by the source because vault_secret_names does not provide a value.
  When migrating from version 1 this field is populated with the content of the top-level field (that is being removed).
- vault_secret_names: an optional mapping containing source-specific keys that can be used to override the default key used to look up secrets in the vault. The keys here depend on the top-level data_source field:
  - Any unknown key (for the given source) must trigger a warning, to help users detect typing errors or misconfiguration.
  - All keys are optional. If not specified, the key itself is used as the value (combined with secret_scope).
  Values in the creds section can be either a simple or scoped key for the vault, with / as the separator. Scope and vault keys must not contain /.
The top-level secret_scope field is removed, superseded by the same value within the creds section.
This will be validated during loading/initialisation.

A minimal example that users the must provide:

---
version: 2
data_source: oracle
creds:
  vault_type: databricks
  secret_scope: some_scope
# ... other fields unrelated to this PR.

In this example the Oracle source will use some_scope/host as the name of the secret to look up via the Databricks Secrets API for the host to connect to:

The Oracle-specific name of the connection property needed is host.
The creds field does not contain an override, so the default is host.
As a simple name, this is prefixed with the configured scope, leading to some_scope/host.

A semi-complete example is:

---
version: 2
data_source: oracle
creds:
  vault_type: databricks
  secret_scope: some_scope
  vault_secret_names:
    host: hostname                  # host     -> some_scope/hostname
    user: another_scope/the_user    # user     -> another_scope/the_user
    # No password entry             # password -> some_scope/password
    pasword: the_password           # WARNING emitted: unused/unknown key: 'pasword'
# ... other fields unrelated to this PR.

Migration rules mean loading an existing configuration:

---
data_source: oracle
secret_scope: a_scope
# ... other fields unrelated to this PR.

…would be loaded as:

---
version: 2
data_source: oracle
creds:
  vault_type: databricks
  secret_scope: a_scope

(This would be interpreted and behave as before without any further intervention.)

This obviously differs from what is implemented in this PR, and although I have marked up some technical things amongst the changes that have already been made we need to agree on what it's supposed to be doing before spending more effort on discussing how it does it.

I appreciate that this is different to what is currently implemented, and suggest it may be prudent to deliver this in stages:

I can see value in leaving out vault_secret_names in the first iteration, and then adding it in a second PR to address #1008.
I can also see value in leaving out the initial stages of support for Snowflake authentication via private keys.

asnare · 2026-01-07T13:26:44Z

src/databricks/labs/lakebridge/connections/credential_manager.py

+        return cls.from_credentials(credentials, ws)
+
+    @classmethod
+    def from_credentials(cls, credentials: dict, ws: WorkspaceClient | None = None) -> "CredentialManager":


Let's go with this then:

Suggested change

def from_credentials(cls, credentials: dict, ws: WorkspaceClient | None = None) -> "CredentialManager":

def from_credentials(cls, credentials: dict[str, str], ws: WorkspaceClient | None = None) -> "CredentialManager":

asnare · 2026-01-07T13:50:38Z

src/databricks/labs/lakebridge/reconcile/connectors/credentials.py

+    vault_type: str
+    vault_secret_names: dict[str, str]


We seem to have lost some conversations here¹², but continuing on…

In what way does enum equality not work as expected? (It seems fine to me.)

Let's go with:

from enum import Enum, unique @unique class VaultType(Enum): DATABRICKS = "databricks" @dataclass class ReconcileCredentialsConfig: vault_type: VaultType = VaultType.DATABRICKS vault_secret_names: dict[str, str]

Footnotes

https://github.com/databrickslabs/lakebridge/pull/2123#discussion_r2619707057 ↩

https://github.com/databrickslabs/lakebridge/pull/2123#discussion_r2585384865 ↩

asnare · 2026-01-07T13:53:58Z

src/databricks/labs/lakebridge/config.py


+@dataclass
+class ReconcileCredentialConfig:
+    vault_type: str  # supports local, env, databricks creds.


This conversation has disappeared from the review screen, so I've continued it here.

asnare · 2026-01-07T13:55:07Z

src/databricks/labs/lakebridge/config.py

+@dataclass
+class ReconcileCredentialConfig:
+    vault_type: str  # supports local, env, databricks creds.
+    source_creds: dict[str, str]


This conversation has disappeared from the review screen, so I've continued it here.

asnare · 2026-01-07T13:57:06Z

src/databricks/labs/lakebridge/connections/credential_manager.py

        return self._provider.get_secret(key)


+def build_credentials(vault_type: str, source: str, credentials: dict) -> dict:


Type arguments on those dict hints.

asnare · 2026-01-07T16:56:43Z

src/databricks/labs/lakebridge/reconcile/connectors/snowflake.py

+            self._creds["pem_private_key"] = SnowflakeDataSource._get_private_key(
+                self._creds["pem_private_key"],
+                self._creds.get("pem_private_key_password"),
+            )


This doesn't look right:

We don't currently support authentication with private keys.

Some of these vault keys aren't mentioned (or validated) when loading.

Storing the extracted private key over the top of the encrypted version feels a bit weird.

asnare · 2026-01-07T17:00:16Z

src/databricks/labs/lakebridge/reconcile/connectors/snowflake.py

+            )
+            creds.vault_secret_names.pop("pem_private_key")
+
+        self._creds_or_empty = load_and_validate_credentials(creds, self._ws, "snowflake")


The instance state shouldn't be updated until after it's been fully processed and validated, some of which is happening below. Suggestion:

loaded_secrets = load_and_validate_credentials(creds, self._ws, "snowflake") # ... self._creds_or_empty = loaded_secrets return self

asnare · 2026-01-07T17:01:45Z

src/databricks/labs/lakebridge/reconcile/connectors/snowflake.py

+        if not self._creds:
+            raise RuntimeError("Credentials not loaded. Please call `load_credentials(ReconcileCredentialsConfig)`.")


Isn't this redundant? I think just using self._creds (below) will already trigger the same exception.

asnare · 2026-01-08T10:14:55Z

src/databricks/labs/lakebridge/reconcile/utils.py

+    if engine != "databricks":
+        assert creds, "Credentials must be provided for non-Databricks sources"
+        source.load_credentials(creds)


This should be at the top, and raise ValueError rather than use an assertion.

asnare · 2026-01-08T10:22:46Z

src/databricks/labs/lakebridge/reconcile/connectors/credentials.py

+@dataclass
+class ReconcileCredentialsConfig:


This needs to be moved back to config.py I think: it's not just a part of our configuration, but being persisted any changes need to be handled carefully for compatibility. Migration code might be needed.

If we leave it here I suspect people will modify it without realising how sensitive it is to changes.

asnare · 2026-01-08T13:43:07Z

src/databricks/labs/lakebridge/reconcile/connectors/snowflake.py

+            f"/?user={self._creds['sfUser']}&password={self._creds['sfPassword']}"
+            f"&db={self._creds['sfDatabase']}&schema={self._creds['sfSchema']}"
+            f"&warehouse={self._creds['sfWarehouse']}&role={self._creds['sfRole']}"
+        )  # TODO Support PEM key auth


This has piqued my curiosity, can you elaborate?

when data is read with JDBC, only password authentication is supported.
PEM gets used only when querying the tables information schema because JDBC is not used. JDBC is used in the next steps afterwards

it leads to an inconsistent UX hence the TODO to support pem as well given the docs mention PEM and password auth is supported. currently, the user gets a warning log that password is required.

No, we support Pem as well. Our current labs infra setup is using a PEM File.

@sundarshankar89

I see this code on main branch databricks/labs/lakebridge/reconcile/connectors/snowflake.py:67
this code gets used when reading data

def get_jdbc_url(self) -> str: try: sf_password = self._get_secret('sfPassword') except (NotFound, KeyError) as e: message = "sfPassword is mandatory for jdbc connectivity with Snowflake."

# Conflicts: # src/databricks/labs/lakebridge/config.py # tests/integration/reconcile/connectors/test_read_schema.py

…actor/creds-manager

Implement databricks creds manager

e145ef3

m-abulazm temporarily deployed to tool October 28, 2025 13:13 — with GitHub Actions Inactive

m-abulazm added 3 commits October 29, 2025 14:05

[WIP] use DatabricksSecretProvider in reconcile datasources

6c75f6c

Merge branch 'main' into refactor/creds-manager

8b7fb0b

# Conflicts: # src/databricks/labs/lakebridge/reconcile/connectors/jdbc_reader.py # src/databricks/labs/lakebridge/reconcile/connectors/oracle.py

update oracle data source after merge

3f81eb4

m-abulazm temporarily deployed to tool October 29, 2025 13:10 — with GitHub Actions Inactive

m-abulazm self-assigned this Oct 29, 2025

m-abulazm added the do-not-merge label Oct 29, 2025

Merge branch 'main' into refactor/creds-manager

4f037a5

m-abulazm temporarily deployed to tool November 7, 2025 12:00 — with GitHub Actions Inactive

add load_credentials interface for data sources and impl for snowflak…

374b63f

…e plus major cleanup

m-abulazm had a problem deploying to tool November 10, 2025 12:00 — with GitHub Actions Error

Merge branch 'main' into refactor/creds-manager

9a1286b

m-abulazm temporarily deployed to tool November 10, 2025 12:01 — with GitHub Actions Inactive

implement for tsql and oracle

5102b3b

m-abulazm temporarily deployed to tool November 10, 2025 13:31 — with GitHub Actions Inactive

improve snowflake load credentials

e07e443

m-abulazm temporarily deployed to tool November 10, 2025 14:17 — with GitHub Actions Inactive

m-abulazm marked this pull request as ready for review November 10, 2025 14:31

m-abulazm requested a review from a team as a code owner November 10, 2025 14:31

add check if user loaded credentials in snowflake and use local creds…

fb6692d

… in one test for it to be green

m-abulazm temporarily deployed to tool November 10, 2025 14:58 — with GitHub Actions Inactive

m-abulazm mentioned this pull request Nov 21, 2025

prompt for recon secrets #2157

Open

5 tasks

revert changes to reconcile before moving to a new branch

a16455d

m-abulazm had a problem deploying to tool November 21, 2025 08:52 — with GitHub Actions Error

Merge branch 'main' into refactor/creds-manager

a62b6e0

m-abulazm temporarily deployed to tool November 21, 2025 08:57 — with GitHub Actions Inactive

revert two more changes

79b0721

m-abulazm temporarily deployed to tool December 22, 2025 09:43 — with GitHub Actions Inactive

Merge branch 'main' into refactor/creds-manager

45e4c63

m-abulazm temporarily deployed to tool December 22, 2025 10:04 — with GitHub Actions Inactive

[WIP] introduce credentials.py module and migrate secret scope to creds

2723dc2

m-abulazm temporarily deployed to tool December 22, 2025 12:45 — with GitHub Actions Inactive

[WIP] handle snowflake auth

234185f

m-abulazm temporarily deployed to tool December 22, 2025 14:07 — with GitHub Actions Inactive

m-abulazm added 2 commits December 22, 2025 16:59

fix snowflake auth handling

4ae37c5

make credentials optional and create test fixtures correctly for gree…

611aef1

…n tests

m-abulazm temporarily deployed to tool December 22, 2025 16:09 — with GitHub Actions Inactive

m-abulazm added 3 commits December 24, 2025 13:47

fix serializing dataclass in production and tests

ac95e52

add more tests

204af6b

Merge branch 'main' into refactor/creds-manager

864c7fa

# Conflicts: # src/databricks/labs/lakebridge/reconcile/connectors/source_adapter.py

m-abulazm temporarily deployed to tool December 24, 2025 13:28 — with GitHub Actions Inactive

introduce oracle specific fixture for better readability

9f6d58c

m-abulazm temporarily deployed to tool December 24, 2025 14:15 — with GitHub Actions Inactive

add test if both auth methods are there

fe90e43

m-abulazm temporarily deployed to tool December 25, 2025 13:50 — with GitHub Actions Inactive

Merge branch 'main' into refactor/creds-manager

cf4fe77

m-abulazm temporarily deployed to tool December 25, 2025 14:30 — with GitHub Actions Inactive

Merge branch 'main' into refactor/creds-manager

e9f550c

m-abulazm temporarily deployed to tool January 5, 2026 10:35 — with GitHub Actions Inactive

asnare requested changes Jan 8, 2026

View reviewed changes

asnare reviewed Jan 8, 2026

View reviewed changes

m-abulazm added 3 commits January 9, 2026 13:46

Merge branch 'main' into refactor/creds-manager

c63a5e0

# Conflicts: # src/databricks/labs/lakebridge/config.py # tests/integration/reconcile/connectors/test_read_schema.py

remove incomplete creds from test fixtures

c568d80

Merge remote-tracking branch 'origin/refactor/creds-manager' into ref…

e3033ff

…actor/creds-manager

m-abulazm temporarily deployed to tool January 9, 2026 15:13 — with GitHub Actions Inactive

	def from_credentials(cls, credentials: dict, ws: WorkspaceClient \| None = None) -> "CredentialManager":
	def from_credentials(cls, credentials: dict[str, str], ws: WorkspaceClient \| None = None) -> "CredentialManager":

		return self._provider.get_secret(key)


		def build_credentials(vault_type: str, source: str, credentials: dict) -> dict:

		if not self._creds:
		raise RuntimeError("Credentials not loaded. Please call `load_credentials(ReconcileCredentialsConfig)`.")

Implement databricks creds manager #2123

Are you sure you want to change the base?

Implement databricks creds manager #2123

Uh oh!

Conversation

m-abulazm commented Oct 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

What does this PR do?

Linked issues

Functionality

Tests

Uh oh!

codecov bot commented Oct 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

github-actions bot commented Oct 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

m-abulazm commented Dec 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

asnare left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Footnotes

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

m-abulazm commented Oct 28, 2025 •

edited

Loading

codecov bot commented Oct 28, 2025 •

edited

Loading

github-actions bot commented Oct 28, 2025 •

edited

Loading

m-abulazm commented Dec 18, 2025 •

edited

Loading

asnare left a comment •

edited

Loading