Is there an existing issue for this?
Is your feature request related to a problem?
I have Tetragon installed on a high-load node with more than 10 different tracing policies.
I’m seeing many errors in tetragon_bpf_missed_events_total (ENOSPC), tetragon_observer_ringbuf_queue_events_lost_total, and tetragon_observer_ringbuf_events_lost_total. I tried increasing --rb-queue-size and --rb-size-total, but it doesn’t seem to help and sometimes even makes things worse.
Describe the feature you would like
It would be great to have a documentation page with recommendations for tuning Tetragon performance under high load. There are metrics that indicate drops/errors in the kernel and in user-space queues, and there are multiple configuration options for ring buffers, caches, and queues. However, it’s not obvious which settings should be adjusted to reduce the number of errors/dropped events.
Describe your proposed solution
No response
Code of Conduct
Is there an existing issue for this?
Is your feature request related to a problem?
I have Tetragon installed on a high-load node with more than 10 different tracing policies.
I’m seeing many errors in tetragon_bpf_missed_events_total (ENOSPC), tetragon_observer_ringbuf_queue_events_lost_total, and tetragon_observer_ringbuf_events_lost_total. I tried increasing --rb-queue-size and --rb-size-total, but it doesn’t seem to help and sometimes even makes things worse.
Describe the feature you would like
It would be great to have a documentation page with recommendations for tuning Tetragon performance under high load. There are metrics that indicate drops/errors in the kernel and in user-space queues, and there are multiple configuration options for ring buffers, caches, and queues. However, it’s not obvious which settings should be adjusted to reduce the number of errors/dropped events.
Describe your proposed solution
No response
Code of Conduct