You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The javadoc of totalLoadTime says thatit’s an ever increasing value so you are right, maybe this should be a counter so that it signals the nature and you can easily calculate the rate on the backend. I guess the reason why this isn’t a counter is because different backends use different time units and our counters don’t have the notion of time but TimeGauge can convert the recorded value to the unit of the backend (seconds for Prometheus, millis for StatsD, etc.).
But we are not 100% consistent on that either check ProcessorMetrics.java, the cpu time is a counter there and the unit is always ns even if everything time related in your registry is in seconds.
So in the case of TimeGauge (caffeine), you will get the right unit but not the counter nature but in the case of FunctionCounter (cpu time) you will get the counter nature but the same unit for every registry.
Maybe we should use a FunctionTimer and for its count property use successCount + failureCount. Or come up with a new Meter that is like a FunctionTimer without a count property (only tracks total time), basically a FunctionCounter that can convert time units.
I looked into this a little, effectively, CaffeineCacheMetrics does this:
TimeGauge.builder("cache.load.duration", cache, TimeUnit.MILLISECONDS, c -> 300).register(registry);
FunctionCounter.builder("cache.load", cache, c -> 1).tags("result", "success").register(registry);
FunctionCounter.builder("cache.load", cache, c -> 2).tags("result", "failure").register(registry);
So the output is something like this:
# HELP cache_load_total
# TYPE cache_load_total counter
cache_load_total{result="failure"} 2.0
cache_load_total{result="success"} 1.0
# HELP cache_load_duration_seconds
# TYPE cache_load_duration_seconds gauge
cache_load_duration_seconds 0.3
Maybe we should do this:
FunctionTimer.builder("cache.load.duration", cache, c -> 3, c -> 300, TimeUnit.MILLISECONDS).register(registry);
FunctionCounter.builder("cache.load", cache, c -> 1).tags("result", "success").register(registry);
FunctionCounter.builder("cache.load", cache, c -> 2).tags("result", "failure").register(registry);
So the output would look like like:
# HELP cache_load_total
# TYPE cache_load_total counter
cache_load_total{result="failure"} 2.0
cache_load_total{result="success"} 1.0
# HELP cache_load_duration_seconds
# TYPE cache_load_duration_seconds summary
cache_load_duration_seconds_count 3
cache_load_duration_seconds_sum 0.3
Other than cache_load_duration_seconds_count is redundant (equals to cache_load_total), the above (as-is) is a breaking change so we need to figure out how to keep the old time series (cache_load_duration_seconds) too.
Describe the bug
In the binder CaffeineCacheMetrics.java (the metric cache.load.duration metric) uses a TimeGauge to store the value reported from method in totalLoadTime() in caffeine, which is an ever growing value: (https://www.javadoc.io/doc/com.github.ben-manes.caffeine/caffeine/2.2.2/com/github/benmanes/caffeine/cache/stats/CacheStats.html#totalLoadTime--).
Environment
I found this in our production environment
To Reproduce
micrometer/micrometer-core/src/main/java/io/micrometer/core/instrument/binder/cache/CaffeineCacheMetrics.java
Line 190 in f3f5d76
Expected behavior
The expected behaviour is to have the cache.load.duration as a meter of Counter Type.
Additional context
I asked in the slack channel about this and @jonatan-ivanov kindly answered and asked me to report this a as bug. He also had some good suggestions on how what type to use. Here is a link to that answer: https://micrometer-metrics.slack.com/archives/C98P1USPR/p1736375533224759?thread_ts=1736323360.611049&cid=C98P1USPR
The text was updated successfully, but these errors were encountered: