-
-
Notifications
You must be signed in to change notification settings - Fork 107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add prometheus metrics #429
base: master
Are you sure you want to change the base?
Conversation
Metrics should not contain duplicate data, summing all metrics of a given name (regardless of labels) needs to be meaningful. Therefore the counter metrics based on different DNS aspects have been split into separate metric names.
I disabled the elapsed time metric, until I point it to the remote-control time value.
I used
Help and type annotations added. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code looks good, and in general I like it, but I would want to change the copy of the statistics management code from remote.c into metrics.c, instead expose one or more static routines from remote.c and call them from metrics.c, making the code more understandable and maintainable.
@@ -162,6 +162,12 @@ nsd_options_create(region_type* region) | |||
opt->server_cert_file = CONFIGDIR"/nsd_server.pem"; | |||
opt->control_key_file = CONFIGDIR"/nsd_control.key"; | |||
opt->control_cert_file = CONFIGDIR"/nsd_control.pem"; | |||
#ifdef USE_METRICS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The metrics options are not in nsd-checkconf.c, in the functions config_print_zone and config_test_print_server the options also need to be handled.
"Uptime since server boot in seconds.", "counter"); | ||
evbuffer_add_printf(buf, "nsd_time_up_seconds_total %lu.%6.6lu\n", | ||
(unsigned long)uptime.tv_sec, (unsigned long)uptime.tv_usec); | ||
/* TODO: re-add when elapsed time resetting on nsd-control stats is implemented |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this have for example an implementation with if(clear) xfrd->nsd->rc->stats_time = *now;
, similar to remote.c:3230. The comment could also be about something else.
|
||
/** process the statistics and write them into the buffer */ | ||
static void | ||
process_stats(struct evbuffer *buf, xfrd_state_type* xfrd, int peek) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The routines are copies of code at remote.c, I do not think that copying code that manages the inter-process communication part of the statistics like that is a very good idea. It would be better to make the routine public in remote.c and add its definition to remote.h and then call it here. The process_stats() function then needs to be changed to allow the different print_stats call than the print_stats call from remote.c.
The port number for the HTTP service. Default is 9100. | ||
.TP | ||
.B metrics\-path:\fR <string> | ||
The HTTP path to expose the metrics at. Default is /metrics. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps the path name extent is more precise with a quoted string.
The HTTP path to expose the metrics at. Default is /metrics. | |
The HTTP path to expose the metrics at. Default is "/metrics". |
This adds support for Prometheus metrics, exposed using
libevent
's HTTP serverimplementation. It is feature gated with
--enable-prometheus-metrics
.The statistics export and socket code is based on code from
remote.c
.Here is an excerpt from the exported metrics: