-
Notifications
You must be signed in to change notification settings - Fork 259
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: Enable prometheus metrics #3675
Comments
Hi, sorry for the confusion. This is indeed related to an incorrect default command specified in the In the meantime, can you try setting the following command in
This should be roughly equivalent to the command that was previously specified in (If you have adjusted some of the Gunicorn configuration values such as the number of workers or the log level, that’s fine -- only thing that’s important is that you specify the Sorry again for the inconvenience! (We also have documentation about this feature in the works.) |
@lyz-code Here’s a link to the WIP documentation, but please take it with a grain of salt, as it is still a work in progress: https://github.com/alephdata/aleph/blob/docs/tech-docs/docs/src/pages/developers/how-to/operations/prometheus/index.mdx If you run into other problems, please let me know. I’m happy to help and will make sure to update the documentation accordingly. |
Hi @tillprochaska first thank you so much for the Prometheus work it looks very promising. I haven't seen that many applications with so detailed app metrics, so congratulations. I've followed your guides and now I'm seeing the next error on the API:
I didn't set any gunicorn configurations myself |
@lyz-code I think you’re onto something. It seems there was a mistake in our release process. I’ll let you know when I know more. |
Hi @lyz-code, sorry, just a quick update: This is indeed an issue with the 3.15.5 release. While we did include the Prometheus feature in the release candidates for 3.15.5, we made a mistake when releasing 3.15.5 and so it’s not actually included in that release. We’ll try to do a proper, new release soon. |
Hi @tillprochaska I've seen that 3.15.6 didn't fix the bug. I know you didn't say it has but I wanted to try :P. FYI, I'm seeing another error when spawning the exporter on the latest version.
I've also seen that the suggested port of the docker-compose for the |
Yes, you’re right! 3.15.6 is a security patch release, so we decided to not include anything else besides these patches. I’ll post an update here once the Prometheus feature is properly released. |
Sorry for the slow response. I’ve published a release candidate for a new release that should fix this issue. A final release will hopefully follow soon. Note that if you want to test this release candidate you might need to adjust your |
This should be resolved in the latest release (3.17.0). Keeping this open as I haven’t yet thought about your suggestion regarding the default port:
|
Hi @tillprochaska,
As you can see there are some missing metrics such as Also, have you already created a grafana dashboard to visualise the metrics? If not, shall we track this in this issue or should I open a new one? Thanks as always |
@lyz-code Thanks for the feedback! The metrics you’re looking at are only the metrics exported by the separate exporter. However, the
I fiddled around a little bit with a Grafana dashboard based on these metrics, but we’re not using Grafana internally (at least for now), so there are no immediate plans to publish/maintain an official dashboard. You can however open a separate feature request issue for this. And if you happen to build a dashboard yourself, there might be other Aleph admins who find this useful as well. |
Hi @tillprochaska thanks for the quick answer. Would it be possible that the
The only way for us to extract the metrics from the dockers would be to open a port per docker ( |
@lyz-code Hey, I can’t give you a definitive answer here, but as far as I know the recommended solution is to run Prometheus in the same network as the instances you’re monitoring. You might be able to run a Prometheus instance on the same host to aggregate metrics from all Aleph containers, then make use of federation to allow your main Prometheus instance to scrape them. PushProx may be another solution. However, I do not have any personal experience with any of these approaches. It’s unfortunately unlikely that we will implement aggregation of metrics from other containers in the exporter ourselves, as this would add lots of complexity and likely lead to problems for example when running multiple worker instances or automatically scaling the number of worker instances. |
Thanks @tillprochaska it makes sense what you say. I'm fine with the closing of the issue (•‿•) |
Describe the bug
I've seen that Prometheus metrics have been available for a while but I'm not able to make them work.
To Reproduce
Steps to reproduce the behavior:
PROMETHEUS_ENABLED=true
in your aleph.env file and restart Alephdocker exec -it aleph_api_1 bash
curl http://localhost:9100
curl: (7) Failed to connect to localhost port 9100: Connection refused
Expected behavior
Prometheus metrics are fetched
Aleph version
3.15.5
Screenshots
If applicable, add screenshots to help explain your problem.
Additional context
I'm not able to unset the
command
directive on the docker-compose maybe that's preventing the prometheus metrics server to be loaded.The text was updated successfully, but these errors were encountered: