Skip to content

filesystem SQL client S3 secret uses credential_chain without REFRESH, so temp creds aren't refreshed on long reads #3987

@JonnyTran

Description

@JonnyTran

dlt version

1.27.0

Describe the problem

TLDR: On longer jobs, S3's cred can timeout on duckdb's sql_client in ~8h without refresh.

When reading a filesystem dataset via dlt.dataset(...).sql_client (DuckDB), the S3 secret is created once on open_connection. For default-chain AWS credentials, create_secret in dlt/destinations/impl/duckdb/sql_client.py emits:

CREATE OR REPLACE SECRET ... (
    TYPE S3,
    PROVIDER credential_chain,
    REGION '...', ENDPOINT '...', SCOPE '...', URL_STYLE '...', USE_SSL ...
)

This resolves the credential chain once and never refreshes. On a long-held connection (we stream Arrow batches for hours while a slow downstream consumes them), the temporary ECS task-role token expires and the next lazy GET fails:

OSError: HTTP Error: HTTP GET error on 's3://.../part-00010-....parquet' (HTTP 400)

DuckDB supports REFRESH auto on credential_chain secrets, which re-runs the chain when credentials expire. dlt never sets it, so there's no way to keep a long-lived sql_client reading past the credential TTL without manually re-creating the secret.

Expected behavior

Secrets created from credential_chain should include REFRESH auto (or expose an option to enable it), so long-running reads on temporary credentials don't die with HTTP 400 on expiry.

Steps to reproduce

  1. Run on ECS/Lambda (temporary role creds) or any setup where the default chain returns a short-lived token.
  2. Open a filesystem dataset and stream slowly:
ds = dlt.dataset(
    destination=dlt.destinations.filesystem(bucket_url="s3://my-bucket/path"),
    dataset_name="my_dataset",
)
with ds.sql_client as client:
    with client.execute_query("SELECT * FROM my_table") as cursor:
        for batch in cursor.iter_arrow(chunk_size=5000):
            time.sleep(...)  # simulate slow consumer; hold the connection past creds TTL
  1. Once the token expires, the next S3 GET fails with HTTP 400.

Workaround: after opening the connection, CREATE OR REPLACE SECRET <dlt_secret_name> (TYPE S3, PROVIDER credential_chain, REFRESH auto, ...) to override dlt's secret.

Operating system

Linux

Runtime environment

AWS ECS (Fargate)

Python version

3.13

dlt data source

filesystem (S3, DuckDB SQL client / dlt.dataset)

Metadata

Metadata

Assignees

Labels

bugSomething isn't workingdestinationIssue with a specific destination

Type

No type
No fields configured for issues without a type.

Projects

Status
Todo

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions