dlt version
1.27.0
Describe the problem
TLDR: On longer jobs, S3's cred can timeout on duckdb's sql_client in ~8h without refresh.
When reading a filesystem dataset via dlt.dataset(...).sql_client (DuckDB), the S3 secret is created once on open_connection. For default-chain AWS credentials, create_secret in dlt/destinations/impl/duckdb/sql_client.py emits:
CREATE OR REPLACE SECRET ... (
TYPE S3,
PROVIDER credential_chain,
REGION '...', ENDPOINT '...', SCOPE '...', URL_STYLE '...', USE_SSL ...
)
This resolves the credential chain once and never refreshes. On a long-held connection (we stream Arrow batches for hours while a slow downstream consumes them), the temporary ECS task-role token expires and the next lazy GET fails:
OSError: HTTP Error: HTTP GET error on 's3://.../part-00010-....parquet' (HTTP 400)
DuckDB supports REFRESH auto on credential_chain secrets, which re-runs the chain when credentials expire. dlt never sets it, so there's no way to keep a long-lived sql_client reading past the credential TTL without manually re-creating the secret.
Expected behavior
Secrets created from credential_chain should include REFRESH auto (or expose an option to enable it), so long-running reads on temporary credentials don't die with HTTP 400 on expiry.
Steps to reproduce
- Run on ECS/Lambda (temporary role creds) or any setup where the default chain returns a short-lived token.
- Open a filesystem dataset and stream slowly:
ds = dlt.dataset(
destination=dlt.destinations.filesystem(bucket_url="s3://my-bucket/path"),
dataset_name="my_dataset",
)
with ds.sql_client as client:
with client.execute_query("SELECT * FROM my_table") as cursor:
for batch in cursor.iter_arrow(chunk_size=5000):
time.sleep(...) # simulate slow consumer; hold the connection past creds TTL
- Once the token expires, the next S3 GET fails with HTTP 400.
Workaround: after opening the connection, CREATE OR REPLACE SECRET <dlt_secret_name> (TYPE S3, PROVIDER credential_chain, REFRESH auto, ...) to override dlt's secret.
Operating system
Linux
Runtime environment
AWS ECS (Fargate)
Python version
3.13
dlt data source
filesystem (S3, DuckDB SQL client / dlt.dataset)
dlt version
1.27.0
Describe the problem
TLDR: On longer jobs, S3's cred can timeout on duckdb's
sql_clientin ~8h without refresh.When reading a filesystem dataset via
dlt.dataset(...).sql_client(DuckDB), the S3 secret is created once onopen_connection. For default-chain AWS credentials,create_secretindlt/destinations/impl/duckdb/sql_client.pyemits:This resolves the credential chain once and never refreshes. On a long-held connection (we stream Arrow batches for hours while a slow downstream consumes them), the temporary ECS task-role token expires and the next lazy
GETfails:DuckDB supports
REFRESH autooncredential_chainsecrets, which re-runs the chain when credentials expire. dlt never sets it, so there's no way to keep a long-livedsql_clientreading past the credential TTL without manually re-creating the secret.Expected behavior
Secrets created from
credential_chainshould includeREFRESH auto(or expose an option to enable it), so long-running reads on temporary credentials don't die with HTTP 400 on expiry.Steps to reproduce
Workaround: after opening the connection,
CREATE OR REPLACE SECRET <dlt_secret_name> (TYPE S3, PROVIDER credential_chain, REFRESH auto, ...)to override dlt's secret.Operating system
Linux
Runtime environment
AWS ECS (Fargate)
Python version
3.13
dlt data source
filesystem(S3, DuckDB SQL client /dlt.dataset)