Open
Description
Describe the bug
Cortex ruler logs are showing an error EOF when posting alert to Cortex alertmanager.
level=error caller=notifier.go:527 user=tenant-one alertmanager=http://cortex-alertmanager.cortex.svc.cluster.local:8080/api/prom/alertmanager/api/v1/alerts count=1 msg="Error sending alert" err="Post \"http://cortex-alertmanager.cortex.svc.cluster.local:8080/api/prom/alertmanager/api/v1/alerts\": EOF"
notifier.go is Prometheus code and could miss a req.Close = true
as pointed out here.
Bug report exist in Prometheus repo :
- Prometheus reports an error EOF when sending alert to alertmanager prometheus/prometheus#9176
- Fail to send alerts to alertmanager due to EOF prometheus/prometheus#9057
The number of file descriptor used by alertmanager process is < default limit of alertmanager file descriptor in my case.
This bug does not seems to be tight to a specific alert and happen randomly.
To Reproduce
Steps to reproduce the behavior:
- setup ruler / alertmanager
- send alerts from ruler to alertmanager up to get EOF
Expected behavior
alertmanager should receive all POST alerts correctly
Environment:
- Infrastructure: Kubernetes
- Deployment tool: Helm Cortex (1.6.0), Cortex (v1.13.0)