Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Let Linux system metrics monitor log only changing values #65

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

sgrimm-sg
Copy link

The Linux system metrics monitor produces a lot of metrics and logs them
a couple times a minute. Some of them change only slowly, others not at all.

This change adds a configuration option to allow non-changing metrics to
be logged less frequently. The default behavior is unchanged (log
everything on every monitor run) but you can now do

implicit_metric_monitor: false,
monitors: [
{
// Listen for syslog messages
module: "scalyr_agent.builtin_monitors.linux_system_metrics",
log_all_interval: 3600
}
]

to only log the full set of metrics once an hour. For the rest of the
hour, only metrics whose values have changed will be logged. This cuts
down significantly on the size of the system metrics logs, which can
add up to a bunch of data as the number of hosts grows.

@czerwingithub
Copy link
Contributor

My apologies that it took us this long to respond to your pull request.

This is a nice idea -- only logging the value for certain metrics if their value changes. It is a common technique with metrics. However, this currently won't play well with how you use metrics with Scalyr.

For example, if you are graphing a composite of multiple metrics, this will throw it off. For instance, min, max, or average memory usage across a fleet of servers would be thrown off by missing data points from a server whose memory usage isn't changing.

Also, some of these metrics are cumulative values that are reported by the agent where the server will calculate the delta at ingestion time. For example, the accumulated CPU tick counts are reported, but the server computes a delta to calculate the actual CPU used during the interval. If the cumulative values are not reported frequently enough, the server will not compute the deltas.

If the main goal is to cut down on the number of bytes uploaded, what would you say to either providing configuration options to eliminate some of the metrics you do not care about, or adjusting the frequency with which the metrics are written out?

@sgrimm-sg
Copy link
Author

A way to specify which metrics to log, and how often, would be great.

That said, I'm not sure this change has quite as much of a problem with missing data as it appears, and I think it may not be mutually exclusive with an "only show metrics matching this set of patterns" kind of option. One reason I added the log_all_interval setting here is specifically to support the kind of use case you mention. If you are tracking the values of metrics with an "average over the last 15 minutes" aggregation function, you can lower that interval to, say, 5 minutes and there will be multiple values for each host for the reporting tools to look at.

The Linux system metrics monitor produces a lot of metrics and logs them
a couple times a minute. Some of them change only slowly, others not at all.

This change adds a configuration option to allow non-changing metrics to
be logged less frequently. The default behavior is unchanged (log
everything on every monitor run) but you can now do

  implicit_metric_monitor: false,
  monitors: [
    {
      // Listen for syslog messages
      module: "scalyr_agent.builtin_monitors.linux_system_metrics",
      log_all_interval: 3600
    }
  ]

to only log the full set of metrics once an hour. For the rest of the
hour, only metrics whose values have changed will be logged. This cuts
down significantly on the size of the system metrics logs, which can
add up to a bunch of data as the number of hosts grows.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants