New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Telemetry rework: Moving towards a Prometheus-first model #3663
Comments
Tagging @Neverlord since you might have thoughts about all of this. |
This work is mostly done at this point, absent a few things:
|
Another question: We have an option called |
@ckreibich and I discussed this at length in a call. Here are the decisions made:
|
I've been working for a few months on a big rework of the telemetry/metrics system inside of Zeek. It's been through a few iterations, and I'm at a point where I have some unanswerable questions so I'm bringing them up in a ticket. The current design looks like this:
metrics_port
field is added to the cluster node record that allows setting the port used by the node for metrics output.OpaqueVal
interface to script-land plus a few other niceties.Comments/questions above:
(2) This is a breaking change. All of the script-land options that were in the
Broker
module were removed, including the option change handlers, etc.(3) This will add a management burden to users. Primarily, they need to be able to plumb whatever ports they configured for each node so that Prometheus can talk to it.
(4) This is where the majority of my questions lie:
RecordVal
can’t be filled in, and the name field will contain more than just the name itself. Based on the documentation from Prometheus, a metric should contain a prefix but not must contain one. I can make the assumption that all metrics in Zeek will have a prefix, scrape the first word in the name from before the first underscore, and use that as a prefix value, but that could break down if some external library does not include a prefix on one of their metrics. See the telemetry log policy for why this is a problem. It does some filtering based on the prefix values returned from Zeek.RecordVal
. It's not possible to scrape this out of the full name because units may contain underscores, so it's impossible to know what set of words in a name are the unit. This page shows some samples of units, plus general naming rules for the other fields.is_sum
information separate, but we can interpret a metric name ending with_total
to mean the same thing.double
s, so I have to do a little bit of finagling with external metrics to set the right type in theRecordVal
.The text was updated successfully, but these errors were encountered: