Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

metric for difference between endpoints that has consisted for t seconds #37

Open
ties opened this issue Nov 24, 2021 · 1 comment
Open
Milestone

Comments

@ties
Copy link

ties commented Nov 24, 2021

As a user, I want a metric for the difference between various sources that has consisted for t time so that I can monitor that my various sources (rtr, json, ...) converge.

Situation

  • Two different RPs
  • Refresh at different times

Because the publication of VRPs is continuous, the RPs will have a slightly different view of what VRPs exist. If you want to monitor that they converge, you could alert on: "the difference is continuously non-zero for 30 minutes" (and assume it drops to zero at some point in time).

In practice, this causes false positives if updates are frequent enough. Another way to go is to check what objects in A are not in B, and were seen in A at least visibility_seconds ago. That way you can have

This is similar to what I added to rtrmon, where there is a vrp_diff for objects that were seen in the source for the first time visibility_seconds ago.

Maybe a real set of metrics is clearer:

# HELP rpki_vrps Total number of VRPS/amount of differents.
# TYPE rpki_vrps gauge
rpki_vrps{server="primary",type="diff",url="http://routinator-1:9556/json"} 1110
rpki_vrps{server="primary",type="total",url="http://routinator-1:9556/json"} 143981
rpki_vrps{server="secondary",type="diff",url="https://ca-software/api/monitoring/roa-prefixes"} 1
rpki_vrps{server="secondary",type="total",url="https://ca-software/api/monitoring/roa-prefixes"} 142872
# HELP rtr_serial Serial of the RTR session.
# TYPE rtr_serial gauge
rtr_serial{server="primary",url="http://routinator-1:9556/json"} 0
rtr_serial{server="secondary",url="https://ca-software/api/monitoring/roa-prefixes"} 0
# HELP rtr_session ID of the RTR session.
# TYPE rtr_session gauge
rtr_session{server="primary",url="http://routinator-1:9556/json"} 0
rtr_session{server="secondary",url="https://ca-software/api/monitoring/roa-prefixes"} 0
# HELP update Timestamp of last update.
# TYPE update gauge
update{server="primary",url="http://routinator-1:9556/json"} 1.637752522e+09
update{server="secondary",url="https://ca-software/api/monitoring/roa-prefixes"} 1.63775261e+09
# HELP vrp_diff Number of VRPS in [lhs_url] that are not in [rhs_url] that were first seen [visibility_seconds] ago in lhs.
# TYPE vrp_diff gauge
vrp_diff{lhs_url="http://routinator-1:9556/json",rhs_url="https://ca-software/api/monitoring/roa-prefixes",visibility_seconds="0"} 1110
vrp_diff{lhs_url="http://routinator-1:9556/json",rhs_url="https://ca-software/api/monitoring/roa-prefixes",visibility_seconds="1024"} 1110
vrp_diff{lhs_url="http://routinator-1:9556/json",rhs_url="https://ca-software/api/monitoring/roa-prefixes",visibility_seconds="1706"} 1110
vrp_diff{lhs_url="http://routinator-1:9556/json",rhs_url="https://ca-software/api/monitoring/roa-prefixes",visibility_seconds="256"} 1110
vrp_diff{lhs_url="http://routinator-1:9556/json",rhs_url="https://ca-software/api/monitoring/roa-prefixes",visibility_seconds="3411"} 1110
vrp_diff{lhs_url="http://routinator-1:9556/json",rhs_url="https://ca-software/api/monitoring/roa-prefixes",visibility_seconds="56"} 1110
vrp_diff{lhs_url="http://routinator-1:9556/json",rhs_url="https://ca-software/api/monitoring/roa-prefixes",visibility_seconds="596"} 1110
vrp_diff{lhs_url="http://routinator-1:9556/json",rhs_url="https://ca-software/api/monitoring/roa-prefixes",visibility_seconds="851"} 1110
vrp_diff{lhs_url="https://ca-software/api/monitoring/roa-prefixes",rhs_url="http://routinator-1:9556/json",visibility_seconds="0"} 1
vrp_diff{lhs_url="https://ca-software/api/monitoring/roa-prefixes",rhs_url="http://routinator-1:9556/json",visibility_seconds="1024"} 0
vrp_diff{lhs_url="https://ca-software/api/monitoring/roa-prefixes",rhs_url="http://routinator-1:9556/json",visibility_seconds="1706"} 0
vrp_diff{lhs_url="https://ca-software/api/monitoring/roa-prefixes",rhs_url="http://routinator-1:9556/json",visibility_seconds="256"} 0
vrp_diff{lhs_url="https://ca-software/api/monitoring/roa-prefixes",rhs_url="http://routinator-1:9556/json",visibility_seconds="3411"} 0
vrp_diff{lhs_url="https://ca-software/api/monitoring/roa-prefixes",rhs_url="http://routinator-1:9556/json",visibility_seconds="56"} 0
vrp_diff{lhs_url="https://ca-software/api/monitoring/roa-prefixes",rhs_url="http://routinator-1:9556/json",visibility_seconds="596"} 0
vrp_diff{lhs_url="https://ca-software/api/monitoring/roa-prefixes",rhs_url="http://routinator-1:9556/json",visibility_seconds="851"} 0

In the diagram you see that the instantaneous difference grows, but the long-term difference never grows:
Screenshot 2021-11-24 at 12 23 15

@partim
Copy link
Member

partim commented Nov 24, 2021

Am I understanding you right that you want to track any single difference and how long it’s been around and count those that have been around for more than t seconds?

@partim partim added this to the 0.2.1 milestone Nov 29, 2021
@partim partim removed this from the 0.2.1 milestone Feb 16, 2023
@partim partim added this to the 0.3.0 milestone Nov 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants