Skip to content

fix(duckdb): refresh credential_chain secrets to survive temp-token expiry (#3987)#4021

Open
0ywfe wants to merge 2 commits into
dlt-hub:develfrom
0ywfe:fix/3987-duckdb-credential-chain-refresh
Open

fix(duckdb): refresh credential_chain secrets to survive temp-token expiry (#3987)#4021
0ywfe wants to merge 2 commits into
dlt-hub:develfrom
0ywfe:fix/3987-duckdb-credential-chain-refresh

Conversation

@0ywfe

@0ywfe 0ywfe commented Jun 3, 2026

Copy link
Copy Markdown

Summary

Closes #3987 — DuckDB S3 secrets created via PROVIDER credential_chain never refresh, so long-held sql_client connections die with HTTP 400 ExpiredToken once the temporary AWS token rotates (typical lifetime ~8h on ECS / EKS IRSA / EC2 instance profile).

DuckDB 1.1.0 added REFRESH auto on credential_chain secrets, which re-runs the provider chain on expiry. This is the DuckDB-documented best practice for any long-running consumer of credential_chain — without it the whole point of using a refreshable chain is defeated by the snapshot-once behaviour.

Change

In dlt/destinations/impl/duckdb/sql_client.py, the credential_chain branch of create_secret now emits REFRESH auto, as part of the CREATE OR REPLACE SECRET SQL. Version-gated via the already-imported packaging.version.Version so installations on duckdb < 1.1.0 keep the prior SQL and don't hit a parse error (dlt's minimum is still duckdb>=0.9).

Only the credential_chain branch is touched. The static-credentials branch (KEY_ID / SECRET / SESSION_TOKEN) is unchanged — those values are fixed inputs from the caller, not resolved from a refreshable provider chain, so REFRESH would be a no-op there.

Diff shape

             if isinstance(aws_creds, AwsCredentials) and aws_creds.has_default_credentials():
-                # let DuckDB resolve credentials from botocore's default chain
+                # let DuckDB resolve credentials from botocore's default chain.
+                # REFRESH auto re-runs the chain when temporary credentials expire ...
+                refresh_stmt = (
+                    "REFRESH auto,"
+                    if Version(duckdb.__version__) >= Version("1.1.0")
+                    else ""
+                )
                 sql.append(f"""
                 CREATE OR REPLACE {persistent_stmt} SECRET {secret_name} (
                     TYPE S3,
                     PROVIDER credential_chain,
+                    {refresh_stmt}
                     REGION '{aws_creds.region_name}',
                     ...

Test plan

  • Smoke-tested locally that emitted SQL contains REFRESH auto, when duckdb.__version__ >= 1.1.0 and omits it on older versions
  • Verified the resulting SQL parses and executes cleanly against duckdb 1.5.3 (CREATE OR REPLACE SECRET ... PROVIDER credential_chain, REFRESH auto, ... is accepted)
  • Existing integration tests in tests/load/filesystem/test_sql_client.py exercise the create_secret path and should continue to pass — the change is strictly additive SQL text
  • Workaround in the issue body confirms the same approach: CREATE OR REPLACE SECRET <dlt_secret_name> (TYPE S3, PROVIDER credential_chain, REFRESH auto, ...) — exactly what this PR emits

Why not opt-in?

The issue reporter offered both ("REFRESH auto ... or expose an option to enable it"). I went with default-on because:

  1. REFRESH auto is the DuckDB-documented best practice for credential_chain — snapshot-once defeats the purpose of using a refreshable chain.
  2. An opt-in flag would require every user who hits ExpiredToken to discover the flag, which is exactly the discovery problem this issue reports.
  3. The behaviour is strictly additive — users who don't have expiring credentials see no change.

Happy to convert to an opt-in flag in a follow-up if the maintainers prefer that shape, but the default-on path matches the bug's framing as a correctness regression rather than a feature request.

Out of scope

0ywfe added 2 commits June 3, 2026 15:46
…xpiry (dlt-hub#3987)

When dlt opens a DuckDB S3 secret via PROVIDER credential_chain, the chain
resolves once at secret creation and the resolved temporary token is then
held for the lifetime of the secret. On long-held sql_client connections
(e.g. streaming Arrow batches for hours from ECS task roles, EKS IRSA, or
EC2 instance profile), the temporary token expires and subsequent S3 GETs
fail with HTTP 400 ExpiredToken.

DuckDB 1.1.0 added REFRESH auto on credential_chain secrets, which re-runs
the provider chain on expiry. This is the DuckDB-documented best practice
for any long-running consumer of credential_chain.

This change adds REFRESH auto to the credential_chain branch of
create_secret in dlt/destinations/impl/duckdb/sql_client.py, version-gated
via the already-imported packaging.version.Version so installations on
duckdb < 1.1.0 (dlt's minimum is still 0.9) keep the prior SQL and don't
hit a parse error.

Only the credential_chain branch is touched. The static-credentials branch
(KEY_ID / SECRET / SESSION_TOKEN) does not need REFRESH because those
values are fixed inputs from the caller, not resolved from a refreshable
provider chain.

Closes dlt-hub#3987
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

filesystem SQL client S3 secret uses credential_chain without REFRESH, so temp creds aren't refreshed on long reads

1 participant