Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Tracking] Collect & visualise sustainability-related metrics #20

Open
6 tasks
nikimanoledaki opened this issue Jan 10, 2024 · 4 comments
Open
6 tasks

Comments

@nikimanoledaki
Copy link
Contributor

nikimanoledaki commented Jan 10, 2024

This issue aims to investigate the sustainability-related metrics that could be implemented as part of our reference architecture.

The WG has so far identified the following use cases that each require a slightly different set of metrics:

SRE Metrics

Metrics used by CNCF project maintainers to make improvements at the application level. For example, as mentioned by @incertum in the issue linked before: Falco's own internal metrics (CPU, memory, and counters), traditional SRE metrics (CPU/mem usage), and energy metrics.

More information about this can be found in the Metrics section of the Green Reviews design document.

  • CPU usage
    • Typically measured as a percentage of one CPU, it can be compared with the number of available CPUs on the host. Falco's hot path is single-threaded, so it should not be able to exceed the capacity of one full CPU.
  • Memory RSS
    • Resident Set Size is the portion of memory held in RAM by a process.
  • Memory VSZ
    • Virtual Memory Size is the total memory allocated to a process, including both RAM and swap space.
  • container_memory_working_set_bytes in Kubernetes settings
    • This is almost equivalent to the cgroups container memory_used metric natively exposed in Falco metrics.
  • Traffic rate
    • packets/second

Sustainability Metrics

Other emerging indices that can be used to assess an application's sustainability footprint may also be considered in the future.

Benchmark-Specific Metrics

Metrics to setup the benchmark tests for each CNCF Project.


These metrics are often inter-related. For example, data about energy consumption can be used in each of these scenarios.

This issue can be used to track the ideas and discussions for which metrics the Green Reviews pipeline should track. That being said, prioritisation is key so that the WG remains on track with the milestones that were set in the Roadmap by the group.

@nikimanoledaki nikimanoledaki changed the title [Tracking] Collect sustainability-related metrics [Tracking] Identify which sustainability-related metrics to collect Jan 10, 2024
@nikimanoledaki nikimanoledaki changed the title [Tracking] Identify which sustainability-related metrics to collect [Tracking] Identify metrics to collect Jan 16, 2024
@nikimanoledaki nikimanoledaki changed the title [Tracking] Identify metrics to collect [Tracking] Collect & visualise sustainability-related metrics Jan 17, 2024
@nikimanoledaki
Copy link
Contributor Author

Looking at SRE Metrics, @incertum, do you already have a Grafana dashboard for these metrics? We would need to either create Prometheus queries or access them through the Falco internal metrics.

@incertum
Copy link
Contributor

@nikimanoledaki Falco does not yet have a Prometheus exporter, perhaps for Falco 0.38 in May we may have it, I need to check with the other maintainers. Meanwhile, we have Falco metrics as internal Falco rules that can be piped to logrotated files (JSONL formatted).

Proposing to make the CNCF SRE Metrics independent of Falco or Falco's Metrics and report CPU and memory usages of project binaries through your preferred framework as well as creating your preferred Grafana dashboards. WDYT?

@nikimanoledaki
Copy link
Contributor Author

nikimanoledaki commented Jan 31, 2024

I wonder if there are any useful metrics in the default metrics of Kubernetes, for example:

It would be nice to somehow surface the internal Falco metrics that way, but I'm not sure if that would be possible since those would be logs, not metrics.

What is the filesystem location where the internal Falco metrics are exported? These metrics are at the Pod level, correct?

Which Falco Metrics would you find useful or relevant for either 1) performance monitoring or 2) setting up the benchmark tests?

Looking at this, I imagine "kernel.evt_rate" is one that we would definitely need for the benchmark tests.

@AntonioDiTuri
Copy link
Contributor

I created two deep-dive ticket on the steps to collect the metrics and visualize them.
I made a distinction between Kepler and Kubernetes related metrics which have a more standard approach and Falco that needs some more thought on the process, hope that it is clear, please let me know

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants