Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add prometheus metrics #429

Open
wants to merge 9 commits into
base: master
Choose a base branch
from
Open

Add prometheus metrics #429

wants to merge 9 commits into from

Conversation

mozzieongit
Copy link
Member

@mozzieongit mozzieongit commented Feb 11, 2025

This adds support for Prometheus metrics, exposed using libevent's HTTP server
implementation. It is feature gated with --enable-prometheus-metrics.

The statistics export and socket code is based on code from remote.c.

Here is an excerpt from the exported metrics:

nsd_queries_total{server="0"} 27
nsd_queries_total{server="1"} 22
nsd_queries_by_type_total{type="A"} 42
nsd_queries_by_class_total{class="IN"} 49
nsd_queries_by_opcode_total{opcode="QUERY"} 49
nsd_queries_by_rcode_total{rcode="NOERROR"} 48
nsd_queries_by_transport_total{transport="udp"} 25
nsd_queries_with_edns_total 49
nsd_queries_with_edns_failed_total 0

nsd_connections_total{transport="tcp"} 24
nsd_connections_total{transport="tls6"} 0

nsd_xfr_requests_served_total{xfrtype="AXFR"} 0
nsd_xfr_requests_served_total{xfrtype="IXFR"} 0

nsd_queries_dropped_total 0
nsd_queries_rx_failed_total 0

nsd_answers_tx_failed_total 0
nsd_answers_without_aa_total 0
nsd_answers_truncated_total 0

nsd_time_up_seconds_total 970.392208

nsd_size_db_on_disk_bytes 0
nsd_size_db_in_mem_bytes 0
nsd_size_xfrd_in_mem_bytes 1163464
nsd_size_config_on_disk_bytes 0
nsd_size_config_in_mem_bytes 2208

nsd_zones_primary 1
nsd_zones_secondary 0

nsd_zonestats_examplezone_queries_total 48
nsd_zonestats_examplezone_queries_by_type_total{type="A"} 41

@mozzieongit mozzieongit self-assigned this Feb 11, 2025
Metrics should not contain duplicate data, summing all metrics of
a given name (regardless of labels) needs to be meaningful. Therefore
the counter metrics based on different DNS aspects have been split into
separate metric names.
@mozzieongit
Copy link
Member Author

mozzieongit commented Feb 17, 2025

1. The elapsed time (`nsd_time_elapsed_seconds`) doesn't have much value
   currently, and doesn't reset when `nsd-control stats` is used, so I would
   probably remove it. However, it could be beneficial to reset this time
   whenever `nsd-control` resets the stats.

I disabled the elapsed time metric, until I point it to the remote-control time value.

2. For the zonestats block: Should the label name "zone" maybe renamed to
   "zonegroup" or something else?

I used zonestats, to copy the name from the config file.

3. Currently, the metrics don't have any HELP text (see [Prometheus Exposition Formats](https://prometheus.io/docs/instrumenting/exposition_formats/#comments-help-text-and-type-information))

Help and type annotations added.

@mozzieongit mozzieongit marked this pull request as ready for review February 17, 2025 13:48
Copy link
Member

@wcawijngaards wcawijngaards left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code looks good, and in general I like it, but I would want to change the copy of the statistics management code from remote.c into metrics.c, instead expose one or more static routines from remote.c and call them from metrics.c, making the code more understandable and maintainable.

@@ -162,6 +162,12 @@ nsd_options_create(region_type* region)
opt->server_cert_file = CONFIGDIR"/nsd_server.pem";
opt->control_key_file = CONFIGDIR"/nsd_control.key";
opt->control_cert_file = CONFIGDIR"/nsd_control.pem";
#ifdef USE_METRICS
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The metrics options are not in nsd-checkconf.c, in the functions config_print_zone and config_test_print_server the options also need to be handled.

"Uptime since server boot in seconds.", "counter");
evbuffer_add_printf(buf, "nsd_time_up_seconds_total %lu.%6.6lu\n",
(unsigned long)uptime.tv_sec, (unsigned long)uptime.tv_usec);
/* TODO: re-add when elapsed time resetting on nsd-control stats is implemented
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this have for example an implementation with if(clear) xfrd->nsd->rc->stats_time = *now;, similar to remote.c:3230. The comment could also be about something else.


/** process the statistics and write them into the buffer */
static void
process_stats(struct evbuffer *buf, xfrd_state_type* xfrd, int peek)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The routines are copies of code at remote.c, I do not think that copying code that manages the inter-process communication part of the statistics like that is a very good idea. It would be better to make the routine public in remote.c and add its definition to remote.h and then call it here. The process_stats() function then needs to be changed to allow the different print_stats call than the print_stats call from remote.c.

The port number for the HTTP service. Default is 9100.
.TP
.B metrics\-path:\fR <string>
The HTTP path to expose the metrics at. Default is /metrics.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps the path name extent is more precise with a quoted string.

Suggested change
The HTTP path to expose the metrics at. Default is /metrics.
The HTTP path to expose the metrics at. Default is "/metrics".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants