fix(duckdb): refresh credential_chain secrets to survive temp-token expiry (#3987)#4021
Open
0ywfe wants to merge 2 commits into
Open
fix(duckdb): refresh credential_chain secrets to survive temp-token expiry (#3987)#40210ywfe wants to merge 2 commits into
0ywfe wants to merge 2 commits into
Conversation
…xpiry (dlt-hub#3987) When dlt opens a DuckDB S3 secret via PROVIDER credential_chain, the chain resolves once at secret creation and the resolved temporary token is then held for the lifetime of the secret. On long-held sql_client connections (e.g. streaming Arrow batches for hours from ECS task roles, EKS IRSA, or EC2 instance profile), the temporary token expires and subsequent S3 GETs fail with HTTP 400 ExpiredToken. DuckDB 1.1.0 added REFRESH auto on credential_chain secrets, which re-runs the provider chain on expiry. This is the DuckDB-documented best practice for any long-running consumer of credential_chain. This change adds REFRESH auto to the credential_chain branch of create_secret in dlt/destinations/impl/duckdb/sql_client.py, version-gated via the already-imported packaging.version.Version so installations on duckdb < 1.1.0 (dlt's minimum is still 0.9) keep the prior SQL and don't hit a parse error. Only the credential_chain branch is touched. The static-credentials branch (KEY_ID / SECRET / SESSION_TOKEN) does not need REFRESH because those values are fixed inputs from the caller, not resolved from a refreshable provider chain. Closes dlt-hub#3987
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes #3987 — DuckDB S3 secrets created via
PROVIDER credential_chainnever refresh, so long-heldsql_clientconnections die with HTTP 400ExpiredTokenonce the temporary AWS token rotates (typical lifetime ~8h on ECS / EKS IRSA / EC2 instance profile).DuckDB 1.1.0 added
REFRESH autoon credential_chain secrets, which re-runs the provider chain on expiry. This is the DuckDB-documented best practice for any long-running consumer of credential_chain — without it the whole point of using a refreshable chain is defeated by the snapshot-once behaviour.Change
In
dlt/destinations/impl/duckdb/sql_client.py, thecredential_chainbranch ofcreate_secretnow emitsREFRESH auto,as part of theCREATE OR REPLACE SECRETSQL. Version-gated via the already-importedpackaging.version.Versionso installations on duckdb < 1.1.0 keep the prior SQL and don't hit a parse error (dlt's minimum is stillduckdb>=0.9).Only the
credential_chainbranch is touched. The static-credentials branch (KEY_ID/SECRET/SESSION_TOKEN) is unchanged — those values are fixed inputs from the caller, not resolved from a refreshable provider chain, so REFRESH would be a no-op there.Diff shape
Test plan
REFRESH auto,whenduckdb.__version__ >= 1.1.0and omits it on older versionsduckdb 1.5.3(CREATE OR REPLACE SECRET ... PROVIDER credential_chain, REFRESH auto, ...is accepted)tests/load/filesystem/test_sql_client.pyexercise thecreate_secretpath and should continue to pass — the change is strictly additive SQL textCREATE OR REPLACE SECRET <dlt_secret_name> (TYPE S3, PROVIDER credential_chain, REFRESH auto, ...)— exactly what this PR emitsWhy not opt-in?
The issue reporter offered both ("
REFRESH auto... or expose an option to enable it"). I went with default-on because:REFRESH autois the DuckDB-documented best practice for credential_chain — snapshot-once defeats the purpose of using a refreshable chain.Happy to convert to an opt-in flag in a follow-up if the maintainers prefer that shape, but the default-on path matches the bug's framing as a correctness regression rather than a feature request.
Out of scope
filesystemdestination has a related but distinct issue (filesystem destination: s3fs credentials frozen on snapshot, no refresh — ExpiredToken on long S3 writes (ECS/IRSA/EC2 task roles) #4003) wheres3fssnapshots credentials inAwsCredentials._from_sessionand similarly never refreshes. That fix lives outsidesql_client.pyand is a separate PR.