hive: metrics support with prometheus and grafana #665
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR implements supports for Prometheus metrics collection and automated Grafana setup 🎉
Like
hiveproxy
, hive now optionally runs two additional global containers in the background:Run hive in dev-mode to keep the Hive server with its metrics containers running across test runs, and explore the metrics during/after the simulator test runs.
Hive will add metrics scrape targets to the prometheus instance automatically, and remove them automatically, for all containers with configured metrics options. The hive API that creates client containers will use the hive metadata to add the metrics scrape target option to the function that we create containers with.
See updated hive docs about client configuration. TLDR: an optional
metrics
entry in thehive.yaml
of the client defines a scrape target with port and labels, and hive adds labels likesuite
/test
/version
/etc. dynamically.Metrics are disabled by default, but can be enabled and configured with 3 new flags:
Long-term we could also consider adding a 3rd optional metrics container: there's a grafana renderer docker image available that will run grafana in a headless way, and exposes an API to generate images of dashboards or individual panels. That way we could generate and persist metrics reports for simulator runs! For now we can just start with regular grafana, useful during development, and we can start making nice Hive grafana dashboards.
Prometheus
Example of the prometheus admin frontend (when exposed to host with
-metrics.prometheus=9090
), the targets tab:These targets will be available for grafana charts to query from, and the labels can be used to filter the data of different test-runs, clients, etc.
Grafana
The default port is 8080, but this can be changed with the
-metrics.grafana
flag.Example of the Lighthouse
Summary
dashboard (taken from here: https://github.com/sigp/lighthouse-metrics and then modified to use the provisioned prometheus datasource):With upcoming eth2 testnet setup deduplication work and new simulators we can build a better Hive ethereum testnet dashboard. And maybe the client-teams can add dashboards for their respective clients.
All dashboards are put in the
internal/libdocker/graf/dashboards
directory, and can be grouped with nested file structure. Just make sure you use the prometheus datasource (UID is hardcoded and won't change):Reviewing
The diff is only 600 lines, but the
Summary.json
dashboard source file is 4300 lines. Let me know if I can help explain/document the hive changes themselves better.