`/-/healthy` should fail if metrics are inconsistent.

## Feature request

We are seeing pushgateway occasionally get into a state where it will not accept metrics. The UI reports everything as "last push failed", and no new metrics are collected. This is rare, and only happens in prod. Killing the process fixes the issue (so it doesn't seem to be persisted state). It looks a lot like the situation with identical metrics across groups, but seems to impact everything. At this point the pod does seem to still serve `/metrics` (prom see samples from scrapes).

If client are submitting bad data, then one option would be to run with `--push.disable-consistency-check`.  And then wait for `/metrics` scrapes to fail, and have the pod die. An even nicer approach would be that, once scrapes fail, `/-/healthy` should also fail (the process is literally unhealthy, and wont serve metrics), allowing orchestration to kill it.

**What did you do?**
Run
```
docker run -p 9091:9091  prom/pushgateway:v1.3.0 --push.disable-consistency-check

cat <<EOF | curl -X POST --data-binary @- http://127.0.0.1:9091/metrics/job/some_job/tag/val1
# TYPE some_metric counter
some_metric 1
EOF

cat <<EOF | curl -v -X POST --data-binary @- http://127.0.0.1:9091/metrics/job/some_job
# TYPE some_metric counter
some_metric{tag="val1"} 42
EOF
```
At this point the server is in an inconsistant state

```
$ curl -v  http://localhost:9091/metrics
...
< HTTP/1.1 500 Internal Server Error
...
```
**What did you expect to see?**

```
$ curl -v  http://localhost:9091/-/healthy  
...
...
< HTTP/1.1 200 OK
```

**What did you see instead? Under which circumstances?**
Ideally , once `/metrics` cannot be served, `/-/healthy` should return
an error code.
 

* Pushgateway version:

v1.3.0

* Pushgateway command line:

`--push.disable-consistency-check`

* Logs:
(I've no complaint about anything in the logs)
```
level=info ts=2023-03-24T10:21:11.546Z caller=main.go:83 msg="starting pushgateway" version="(version=1.3.0, branch=HEAD, revision=c28992c985ce6c4fcf4247ba9736b72a3d43882f)"
level=info ts=2023-03-24T10:21:11.546Z caller=main.go:84 build_context="(go=go1.15.2, user=root@cf69166ae53e, date=20201001-12:03:34)"
level=info ts=2023-03-24T10:21:11.547Z caller=main.go:137 listen_address=:9091
level=error ts=2023-03-24T10:21:26.999Z caller=main.go:56 msg="error gathering metrics: collected metric \"some_metric\" { label:<name:\"instance\" value:\"\" > label:<name:\"job\" value:\"some_job\" > label:<name:\"tag\" value:\"val1\" > counter:<value:42 > } was collected before with the same name and label values\n"
level=info ts=2023-03-24T10:25:06.323Z caller=main.go:249 msg="received SIGINT/SIGTERM; exiting gracefully..."
level=error ts=2023-03-24T10:25:06.323Z caller=main.go:197 msg="HTTP server stopped" err="accept tcp [::]:9091: use of closed network connection"
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

`/-/healthy` should fail if metrics are inconsistent. #544

Feature request

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

/-/healthy should fail if metrics are inconsistent. #544

Description

Feature request

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`/-/healthy` should fail if metrics are inconsistent. #544