Skip to content

Remote write doesn't handle SigV4 expiration gracefully #15186

@harry671003

Description

@harry671003

What did you do?

Prometheus logs shows, that when the SigV4 token expires Prometheus doesn't gracefully handle this. The samples are dropped and not retried.

The meric rate(prometheus_remote_storage_samples_failed_total[5m]) also shows an increase at the same time which indicates the samples were not sent correctly.
image

Logs are attached.

What did you expect to see?

Prometheus should handle token expiry gracefully and shouldn't loose samples.

What did you see instead? Under which circumstances?

N/A

System information

Linux

Prometheus version

Prometheus/2.50.1

Prometheus configuration file

remoteWrite:
  - url: https://aps-workspaces.<region>.amazonaws.com/<url>
    sigv4:
      region: <REGION>
      roleArn: arn:aws:iam::<ACCOUNT>:role/<ROLE>

Alertmanager version

No response

Alertmanager configuration file

No response

Logs

ts=2024-10-17T19:40:40.208Z caller=dedupe.go:112 component=remote level=error remote_name=e94106 url=https://aps-workspaces.<region>.amazonaws.com/<url> msg="non-recoverable error while sending metadata" count=914 err="server returned HTTP status 403 Forbidden: {\"message\":\"The security token included in the request is expired\"}"

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions