Skip to content

perf: avoid unnecessray IO in FDUsageReporter #224

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

kruskall
Copy link
Member

@kruskall kruskall commented Jun 9, 2025

What does this PR do?

similar to memstatreporter, we don't need to retrieve all process information.
fdusagereporter only cares about fd limits so only retrieve what we need using gopsutil

Why is it important?

fdusagereporter is a hot path and it's called quite often

benchmark shows significant improvement

              │  old-fd.txt  │             new-fd.txt             │
              │    sec/op    │   sec/op     vs base               │
FDreporter-20   126.33µ ± 3%   35.17µ ± 1%  -72.16% (p=0.002 n=6)

              │  old-fd.txt   │             new-fd.txt              │
              │     B/op      │     B/op      vs base               │
FDreporter-20   33.188Ki ± 0%   9.750Ki ± 0%  -70.62% (p=0.002 n=6)

              │ old-fd.txt │            new-fd.txt             │
              │ allocs/op  │ allocs/op   vs base               │
FDreporter-20   246.0 ± 0%   110.0 ± 0%  -55.28% (p=0.002 n=6)

Bench:

func BenchmarkFDreporter(b *testing.B) {
	r := FDUsageReporter(logp.NewLogger(""), &process.Stats{Hostfs: resolve.NewTestResolver(""), ProcsMap: process.NewProcsTrack()})
	for i := 0; i < b.N; i++ {
		r(monitoring.Full, monitoring.NewKeyValueVisitor(func(s string, i any) {
			//b.Log(fmt.Sprintf("%v=%v", s, i))
		}))
	}
}

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.md

Author's Checklist

  • [ ]

Related issues

similar to memstatreporter, we don't need to retrieve all process
information.
fdusagereporter only cares about fd limits so only retrieve what
we need using gopsutil
@kruskall kruskall requested a review from a team as a code owner June 9, 2025 22:46
@kruskall kruskall requested review from AndersonQ and faec and removed request for a team June 9, 2025 22:46
Comment on lines 40 to 42
p := psprocess.Process{
Pid: int32(os.Getpid()),
}
Copy link
Member

@AndersonQ AndersonQ Jun 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Question]

If I understand it correctly, right now, those metrics are always from the process itself. However depending on hostFS it might be the PID the container sees or the host.
Therefore I wonder if using os.Getpid() might get the wrong PID.

Do you think it's a possibility?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a very good question. Linux has a different implementation of GetSelfPid (ie. read from /proc/self/stat if hostfs is set) but I'm not able to reproduce the mismatched PID.

Tested with:

package main

import (
	"bytes"
	"context"
	"fmt"
	"os"

	"github.com/shirou/gopsutil/v4/process"
)

func main() {
	pid := os.Getpid()
	p := process.Process{
		Pid: int32(pid),
	}
	open, err := p.NumFDsWithContext(context.Background())
	if err != nil {
		panic(err)
	}

	b, err := os.ReadFile("/proc/self/stat")
	if err != nil {
		panic(err)
	}
	before, _, _ := bytes.Cut(b, []byte(" "))
	fmt.Printf("os.Getpid: %d\n", pid)
	fmt.Printf("self/stat: %s\n", before)
	fmt.Printf("open: %d\n", open)
}

run the app in a container with:

  • docker run -it -v .:/home/ --workdir=/home golang go run main.go
  • docker run -it -v .:/home/ --workdir=/home --net=host --pid=host golang go run main.go

It always returns the same PID and the correct open fds

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get it, but what I believe is the issue is when we use hostfs Go doesn't know that.
This is what I did to test:

package report

import (
	"fmt"
	"os"
	"testing"

	"github.com/elastic/elastic-agent-system-metrics/metric/system/process"
	"github.com/elastic/elastic-agent-system-metrics/metric/system/resolve"
)

func TestPID(t *testing.T) {
	osPID := os.Getpid()
	processStats := &process.Stats{
		Hostfs: resolve.NewTestResolver("/hostfs"),
	}
	err := processStats.Init()
	if err != nil {
		t.Fatalf("Failed to init process stats: %v", err)
	}

	ps, err := processStats.GetSelf()
	if err != nil {
		t.Fatalf("Failed to get self process: %v", err)
	}

	hostfsPID := ps.Pid
	fmt.Printf("Original PID: %d\n", osPID)
	fmt.Printf("HostFS PID: %d\n", hostfsPID.ValueOr(-1))
	fmt.Println("")
}

then run with:

docker run -it -v "/proc:/hostfs/proc:ro" -v "/:/hostfs:ro" -v .:/home/ --workdir=/home golang go test -v -run TestPID ./report

=== RUN   TestPID
Original PID: 4870
HostFS PID: 498234

--- PASS: TestPID (0.00s)
PASS
ok  	github.com/elastic/elastic-agent-system-metrics/report	0.072s

with your change PID is 4870, but before it'd be 498234

This is what I'm worried about.
I believe the fix here is just to keep getting the PID from processStats.GetSelf()

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! 🙇

I've updated the code, it should work with custom hostfs now 👍

PS. that breaks host-container isolation, I really don't like it :(, is there someplace I can read up more on the requirement/usage of custom hostfs ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know, I learnt we have it while working with the beats and k8s/docker.
I know it's used in k8s to avoid giving elevated permissions to a container, so some host paths are mounted on the container, that way the container sees the host but without requiring permissions to do so.
But it's a good question, let me try to dig and see if I find an answer

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants