Summary
When the filesystem destination resolves AWS credentials via boto3's default provider chain (typical for ECS task roles, EKS IRSA, EC2 instance profile), AwsCredentials._from_session snapshots default.access_key/secret_key/token into plain strings and passes them to s3fs as static kwargs. After the cloud provider rotates the temporary token mid-process, dlt continues sending the stale snapshot, and S3 PutObject fails with ExpiredToken.
Related but distinct from #3987 (DuckDB SQL client read path). This issue covers the write path via s3fs in the filesystem destination.
Environment
- dlt 1.x (verified on
devel, also reproduces on dlt-hub/dlt@dc01193839 aka iceberg-streaming-fix)
- AWS Batch on Fargate with IAM Task Role (temporary STS credentials)
- Long-running iceberg load (~1.5h wall clock)
Reproducer
import dlt
pipeline = dlt.pipeline(
pipeline_name="long_load",
destination=dlt.destinations.filesystem(), # bucket_url=s3://...
dataset_name="test",
)
# Run any load that takes longer than the task role's session TTL
pipeline.run(big_iceberg_source(), table_format="iceberg")
On any host where boto3's default chain returns refreshable credentials (ECS metadata endpoint, EKS IRSA web identity, EC2 IMDS, SSO), the run dies at the first PutObject after the provider rotates the underlying token.
Traceback
botocore.exceptions.ClientError: An error occurred (ExpiredToken) when calling the PutObject operation: The provided token has expired.
[...]
File \"/app/.venv/lib/python3.11/site-packages/s3fs/core.py\", line 147, in _error_wrapper
PermissionError: The provided token has expired.
[...]
dlt.pipeline.exceptions.PipelineStepFailed: Pipeline execution failed at \`step=load\` ...
Root cause
`dlt/common/configuration/specs/aws_credentials.py`:
def _from_session(self, session):
...
default = session.get_credentials()
if not default:
return None
self.aws_access_key_id = default.access_key # ← snapshot
self.aws_secret_access_key = TSecretStrValue(default.secret_key)
self.aws_session_token = cast(TSecretStrValue, default.token)
return default
`default` is typically a `botocore.credentials.RefreshableCredentials` instance whose `.access_key/.secret_key/.token` properties resolve transparently against the live provider chain. By extracting them into plain strings we lose the refresh mechanism.
`AwsCredentialsWithoutDefaults.to_s3fs_credentials` then passes those frozen strings as `key=`, `secret=`, `token=` to `s3fs`, which creates an aiobotocore client with static credentials.
Impact
- Any dlt pipeline using filesystem destination on AWS with temporary STS credentials (IRSA, ECS task role, EC2 instance profile, SSO, role assumption with session token) hits this once the run outlasts a single rotation cycle of the underlying provider.
- For long-running migrations / large iceberg loads this is hit reliably.
Proposed fix
In `AwsCredentials` (the subclass with default resolution), detect when credentials came from `RefreshableCredentials` and omit `key/secret/token` from `to_s3fs_credentials`. s3fs then falls back to its own `AioSession()`, which honors the underlying provider and refreshes transparently.
Sketch:
@configspec
class AwsCredentials(AwsCredentialsWithoutDefaults, CredentialsWithDefault):
_credentials_are_refreshable: bool = False
def _from_session(self, session):
from botocore.credentials import RefreshableCredentials
default = super()._from_session(session)
if default is not None:
self._credentials_are_refreshable = isinstance(default, RefreshableCredentials)
return default
def to_s3fs_credentials(self):
creds = super().to_s3fs_credentials()
if self._credentials_are_refreshable:
for k in (\"key\", \"secret\", \"token\"):
creds.pop(k, None)
return creds
Same freeze likely affects `to_pyiceberg_fileio_config` and `to_object_store_rs_credentials` — could be addressed in the same PR or separately.
Related
Summary
When the filesystem destination resolves AWS credentials via boto3's default provider chain (typical for ECS task roles, EKS IRSA, EC2 instance profile),
AwsCredentials._from_sessionsnapshotsdefault.access_key/secret_key/tokeninto plain strings and passes them tos3fsas static kwargs. After the cloud provider rotates the temporary token mid-process, dlt continues sending the stale snapshot, and S3 PutObject fails withExpiredToken.Related but distinct from #3987 (DuckDB SQL client read path). This issue covers the write path via
s3fsin the filesystem destination.Environment
devel, also reproduces ondlt-hub/dlt@dc01193839akaiceberg-streaming-fix)Reproducer
On any host where boto3's default chain returns refreshable credentials (ECS metadata endpoint, EKS IRSA web identity, EC2 IMDS, SSO), the run dies at the first PutObject after the provider rotates the underlying token.
Traceback
Root cause
`dlt/common/configuration/specs/aws_credentials.py`:
`default` is typically a `botocore.credentials.RefreshableCredentials` instance whose `.access_key/.secret_key/.token` properties resolve transparently against the live provider chain. By extracting them into plain strings we lose the refresh mechanism.
`AwsCredentialsWithoutDefaults.to_s3fs_credentials` then passes those frozen strings as `key=`, `secret=`, `token=` to `s3fs`, which creates an aiobotocore client with static credentials.
Impact
Proposed fix
In `AwsCredentials` (the subclass with default resolution), detect when credentials came from `RefreshableCredentials` and omit `key/secret/token` from `to_s3fs_credentials`. s3fs then falls back to its own `AioSession()`, which honors the underlying provider and refreshes transparently.
Sketch:
Same freeze likely affects `to_pyiceberg_fileio_config` and `to_object_store_rs_credentials` — could be addressed in the same PR or separately.
Related