Skip to content

metrics: fix overhead_program_seconds_total unit (ns -> s)#4830

Open
danilovid wants to merge 1 commit intocilium:mainfrom
danilovid:fix/overhead-program-seconds-unit
Open

metrics: fix overhead_program_seconds_total unit (ns -> s)#4830
danilovid wants to merge 1 commit intocilium:mainfrom
danilovid:fix/overhead-program-seconds-unit

Conversation

@danilovid
Copy link
Copy Markdown

The metric tetragon_overhead_program_seconds_total was incorrectly storing
nanoseconds (raw RunTimeNs from BPF program stats) despite the _seconds_total
naming convention.

The root cause was an early conversion of time.Duration to uint64 in
ProgOverhead.RunTime, which lost the type context. Following the Prometheus
convention, time.Duration is now kept throughout and converted to seconds
only when collecting metrics using m.Seconds().

Fixes #4829

Description

Keep time.Duration in ProgOverhead.RunTime instead of converting to uint64,
and use m.Seconds() when reporting to Prometheus.

Changelog

Fix `tetragon_overhead_program_seconds_total` metric to correctly report seconds instead of nanoseconds

@danilovid danilovid requested a review from a team as a code owner April 7, 2026 15:35
@danilovid danilovid requested a review from will-isovalent April 7, 2026 15:35
Copy link
Copy Markdown
Contributor

@FedeDP FedeDP left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch!

Copy link
Copy Markdown
Member

@mtardy mtardy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you show us what the metric now typically looks now? thanks

@mtardy mtardy added the release-note/bug This PR fixes an issue in a previous release of Tetragon. label Apr 8, 2026
@danilovid
Copy link
Copy Markdown
Author

Could you show us what the metric now typically looks now? thanks

Here's what the metric looks like currently - Grafana shows ~4 hours per BPF program call.
image

Here is the current state of the panel in Grafana.
image

@mtardy
Copy link
Copy Markdown
Member

mtardy commented Apr 8, 2026

Could you show us what the metric now typically looks now? thanks

Here's what the metric looks like currently - Grafana shows ~4 hours per BPF program call.

ah sorry I was unclear I meant after this patch, the typical output in the HTTP reply, I just wanted to make sure there's no rounding or so, it should be 10-9/10-8 numbers I imagine.

@danilovid danilovid changed the title metrics: fix overhead_program_seconds_total to report seconds instead of nanoseconds metrics: fix overhead_program_seconds_total unit (ns -> s) Apr 8, 2026
@danilovid
Copy link
Copy Markdown
Author

danilovid commented Apr 8, 2026

Could you show us what the metric now typically looks now? thanks

Here's what the metric looks like currently - Grafana shows ~4 hours per BPF program call.

ah sorry I was unclear I meant after this patch, the typical output in the HTTP reply, I just wanted to make sure there's no rounding or so, it should be 10-9/10-8 numbers I imagine.

After the patch, the metric values look correct:

tetragon_overhead_program_seconds_total{attach="wake_up_new_task",...} 0.003527708
tetragon_overhead_program_runs_total{attach="wake_up_new_task",...} 249

Average: 0.003527708 / 249 = ~14 µs per call - which is a realistic value for a BPF program.

Before the patch the same metric would show values like 3527708 (nanoseconds reported as seconds).

@mtardy
Copy link
Copy Markdown
Member

mtardy commented Apr 8, 2026

It looks like it's almost there you just need to fix checkpatch complaints, mostly commit title length and sign off missing:

=========================================================
[1/1] Running on b66f2b4ebf5d6ebe1efed863484126aeb4ae2922
metrics: fix overhead_program_seconds_total to report seconds instead of nanoseconds
=========================================================
Error: ERROR:MISSING_SIGN_OFF: Missing Signed-off-by: line(s)


NOTE: For some of the reported defects, checkpatch may be able to
      mechanically convert to the typical style using --fix or --fix-inplace.

"[PATCH] metrics: fix overhead_program_seconds_total to report seconds" has style problems, please review.

NOTE: Ignored message types: AVOID_EXTERNS BIT_MACRO C99_COMMENTS C99_COMMENT_TOLERANCE COMMIT_LOG_LONG_LINE COMMIT_MESSAGE COMPLEX_MACRO CONSTANT_CONVERSION CONST_STRUCT EMAIL_SUBJECT ENOSYS FILE_PATH_CHANGES FROM_SIGN_OFF_MISMATCH GIT_COMMIT_ID JIFFIES_COMPARISON LEADING_SPACE LONG_LINE LONG_LINE_COMMENT LONG_LINE_STRING MACRO_ARG_REUSE MACRO_WITH_FLOW_CONTROL MULTISTATEMENT_MACRO_USE_DO_WHILE NOT_UNIFIED_DIFF OPEN_ENDED_LINE PREFER_DEFINED_ATTRIBUTE_MACRO PREFER_KERNEL_TYPES PRINTK_WITHOUT_KERN_LEVEL REPEATED_WORD SPDX_LICENSE_TAG TRACE_PRINTK TRAILING_SEMICOLON TRAILING_STATEMENTS VOLATILE

NOTE: If any of the errors are false positives, please report
      them to the maintainer, see CHECKPATCH in MAINTAINERS.
Error: ERROR:CUSTOM: Please avoid long commit subjects (max: 75, found: 84)

On sign-off

To improve tracking of who did what, the Tetragon project uses a "sign-off" procedure which you can learn more about in the Cilium project documentation.

Concretely it consists of adding your real name and an email address to your git config and using git commit -s to write your commit message. You can use git commit --amend -s to add the signature to existing commits.

See more details on how to comply with the "sign-off" procedure

To add your name and email to your git config:

git config user.email "mail@example.com"
git config user.name "firstname lastname"

Note: add the --global flag after config to configure this by default for repositories on your host.

Then you can use for new commits:

git commit -s

Or for existing commits on which you want to add the signed-off-by statement:

git commit --amend -s

For more information, see this extract from git-commit(1) man page regarding the -s flag:

-s, --signoff, --no-signoff
   Add a Signed-off-by trailer by the committer at the end of the commit log
   message. The meaning of a signoff depends on the project to which you're
   committing. For example, it may certify that the committer has the rights to
   submit the work under the project's license or agrees to some contributor
   representation, such as a Developer Certificate of Origin. (See
   http://developercertificate.org for the one used by the Linux kernel and Git
   projects.) Consult the documentation or leadership of the project to which
   you're contributing to understand how the signoffs are used in that project.

   The --no-signoff option can be used to countermand an earlier --signoff
   option on the command line.

The metric tetragon_overhead_program_seconds_total was incorrectly storing
nanoseconds (raw RunTimeNs from BPF program stats) despite the _seconds_total
naming convention.

The root cause was an early conversion of time.Duration to uint64 in
ProgOverhead.RunTime, which lost the type context. Following the Prometheus
convention, keep time.Duration throughout and only convert to seconds when
collecting metrics using m.Seconds().

Fixes: tetragon_overhead_program_seconds_total reporting values ~1e9 too large.
Signed-off-by: Ilya Danilov <danilovid.94@gmail.com>
@danilovid danilovid force-pushed the fix/overhead-program-seconds-unit branch from b66f2b4 to 903457c Compare April 8, 2026 10:23
@netlify
Copy link
Copy Markdown

netlify bot commented Apr 8, 2026

Deploy Preview for tetragon ready!

Name Link
🔨 Latest commit 903457c
🔍 Latest deploy log https://app.netlify.com/projects/tetragon/deploys/69d62cb8e01fc50008c67f73
😎 Deploy Preview https://deploy-preview-4830--tetragon.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

Copy link
Copy Markdown
Contributor

@olsajiri olsajiri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release-note/bug This PR fixes an issue in a previous release of Tetragon.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Is tetragon_overhead_program_seconds_total intentionally storing nanoseconds?

4 participants