Skip to content

filesystem destination: s3fs credentials frozen on snapshot, no refresh — ExpiredToken on long S3 writes (ECS/IRSA/EC2 task roles) #4003

@tderk

Description

@tderk

Summary

When the filesystem destination resolves AWS credentials via boto3's default provider chain (typical for ECS task roles, EKS IRSA, EC2 instance profile), AwsCredentials._from_session snapshots default.access_key/secret_key/token into plain strings and passes them to s3fs as static kwargs. After the cloud provider rotates the temporary token mid-process, dlt continues sending the stale snapshot, and S3 PutObject fails with ExpiredToken.

Related but distinct from #3987 (DuckDB SQL client read path). This issue covers the write path via s3fs in the filesystem destination.

Environment

  • dlt 1.x (verified on devel, also reproduces on dlt-hub/dlt@dc01193839 aka iceberg-streaming-fix)
  • AWS Batch on Fargate with IAM Task Role (temporary STS credentials)
  • Long-running iceberg load (~1.5h wall clock)

Reproducer

import dlt

pipeline = dlt.pipeline(
    pipeline_name="long_load",
    destination=dlt.destinations.filesystem(),  # bucket_url=s3://...
    dataset_name="test",
)
# Run any load that takes longer than the task role's session TTL
pipeline.run(big_iceberg_source(), table_format="iceberg")

On any host where boto3's default chain returns refreshable credentials (ECS metadata endpoint, EKS IRSA web identity, EC2 IMDS, SSO), the run dies at the first PutObject after the provider rotates the underlying token.

Traceback

botocore.exceptions.ClientError: An error occurred (ExpiredToken) when calling the PutObject operation: The provided token has expired.
[...]
File \"/app/.venv/lib/python3.11/site-packages/s3fs/core.py\", line 147, in _error_wrapper
PermissionError: The provided token has expired.
[...]
dlt.pipeline.exceptions.PipelineStepFailed: Pipeline execution failed at \`step=load\` ...

Root cause

`dlt/common/configuration/specs/aws_credentials.py`:

def _from_session(self, session):
    ...
    default = session.get_credentials()
    if not default:
        return None
    self.aws_access_key_id = default.access_key        # ← snapshot
    self.aws_secret_access_key = TSecretStrValue(default.secret_key)
    self.aws_session_token = cast(TSecretStrValue, default.token)
    return default

`default` is typically a `botocore.credentials.RefreshableCredentials` instance whose `.access_key/.secret_key/.token` properties resolve transparently against the live provider chain. By extracting them into plain strings we lose the refresh mechanism.

`AwsCredentialsWithoutDefaults.to_s3fs_credentials` then passes those frozen strings as `key=`, `secret=`, `token=` to `s3fs`, which creates an aiobotocore client with static credentials.

Impact

  • Any dlt pipeline using filesystem destination on AWS with temporary STS credentials (IRSA, ECS task role, EC2 instance profile, SSO, role assumption with session token) hits this once the run outlasts a single rotation cycle of the underlying provider.
  • For long-running migrations / large iceberg loads this is hit reliably.

Proposed fix

In `AwsCredentials` (the subclass with default resolution), detect when credentials came from `RefreshableCredentials` and omit `key/secret/token` from `to_s3fs_credentials`. s3fs then falls back to its own `AioSession()`, which honors the underlying provider and refreshes transparently.

Sketch:

@configspec
class AwsCredentials(AwsCredentialsWithoutDefaults, CredentialsWithDefault):
    _credentials_are_refreshable: bool = False

    def _from_session(self, session):
        from botocore.credentials import RefreshableCredentials
        default = super()._from_session(session)
        if default is not None:
            self._credentials_are_refreshable = isinstance(default, RefreshableCredentials)
        return default

    def to_s3fs_credentials(self):
        creds = super().to_s3fs_credentials()
        if self._credentials_are_refreshable:
            for k in (\"key\", \"secret\", \"token\"):
                creds.pop(k, None)
        return creds

Same freeze likely affects `to_pyiceberg_fileio_config` and `to_object_store_rs_credentials` — could be addressed in the same PR or separately.

Related

Metadata

Metadata

Assignees

Labels

bugSomething isn't workingdestinationIssue with a specific destinationneeds designValid but needs maintainer alignment on design or approach.

Type

No type
No fields configured for issues without a type.

Projects

Status
Todo

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions