-
Notifications
You must be signed in to change notification settings - Fork 40
fix(httputil): racy in ResponseChain
#708
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(httputil): racy in ResponseChain
#708
Conversation
and optimize memory usage. This commit addresses a critical regression and improves memory efficiency in the `httputil` pkg. Changes: * Previously, it returned a slice referencing a buffer that was immediately returned to the pool, leading to data corruption (use-after-free). It now allocates a safe, independent copy. * Make `Close()` idempotent. Calling `Close()` multiple times no longer causes a panic. * Optimize `FullResponseString` to use zero-copy conversion from the safe byte slice. * Optimize `limitedBuffer` to use a `sync.Pool` for temp 32KB (via `chunkSize` = `DefaultChunkSize`) read chunks. Signed-off-by: Dwi Siswanto <[email protected]>
Signed-off-by: Dwi Siswanto <[email protected]>
| // The returned slice is valid only until Close() is called. | ||
| // Note: This creates a new buffer internally which is returned to the pool. | ||
| // The returned slice is a copy and remains valid even after Close() is called. | ||
| func (r *ResponseChain) FullResponseBytes() []byte { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here (and (*ResponseChain).FullResponseString()), the inconsistency is intentional -- those are now a "safe copy". The previous implementation (#700, yeah, my bad) failed to manage the lifecycle correctly (returning a ptr to a buffer that was immediately freed), which caused the race condition.
And since (*ResponseChain).fullResponse field was removed and headers+body are now stored in separate buffers, we must (forced anyway) alloc new memory to concat them -- rather than returning a fragile view into a temp pooled buffer (which is racy).
tl;dr: (*ResponseChain).FullResponse* is now a copy because physics demands it (non-contiguous source data).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we retract v0.7.0, @Mzack9999?
EDIT:
I don’t think we need to retract that since the issues are in the new API(s) anyway.
|
Bench: func BenchmarkResponseChain_FullResponse(b *testing.B) {
body := bytes.Repeat([]byte("B"), 1024*1024) // 1MB
resp := &http.Response{
StatusCode: 200,
Status: "200 OK",
Body: io.NopCloser(bytes.NewReader(body)),
Header: http.Header{
"Content-Type": []string{"text/plain"},
"Server": []string{"Benchmark"},
},
Proto: "HTTP/1.1",
ProtoMajor: 1,
ProtoMinor: 1,
}
rc := NewResponseChain(resp, -1)
_ = rc.Fill()
defer rc.Close()
b.Run("String", func(b *testing.B) {
b.ReportAllocs()
for b.Loop() {
_ = rc.FullResponseString()
}
})
b.Run("String-Concat", func(b *testing.B) {
b.ReportAllocs()
for b.Loop() {
_ = rc.HeadersString() + rc.BodyString()
}
})
b.Run("Bytes", func(b *testing.B) {
b.ReportAllocs()
for b.Loop() {
_ = rc.FullResponseBytes()
}
})
b.Run("Bytes-Concat", func(b *testing.B) {
b.ReportAllocs()
for b.Loop() {
h := rc.HeadersBytes()
bo := rc.BodyBytes()
buf := make([]byte, len(h)+len(bo))
copy(buf, h)
copy(buf[len(h):], bo)
}
})
}$ go test -run - -bench=BenchmarkResponseChain_FullResponse -benchmem -count 10 ./http | tee /tmp/fullresp
$ cat /tmp/fullresp | grep "/String-16" | tee /tmp/fullresp-string
$ cat /tmp/fullresp | grep "/Bytes-16" | tee /tmp/fullresp-bytes
$ cat /tmp/fullresp | grep "/String-Concat-16" | sed "s/Concat\-//g" | tee /tmp/fullresp-string-concat
$ cat /tmp/fullresp | grep "/Bytes-Concat-16" | sed "s/Concat\-//g" | tee /tmp/fullresp-bytes-concat$ benchstat fullresp-string fullresp-string-concat
goos: linux
goarch: amd64
pkg: github.com/projectdiscovery/utils/http
cpu: 11th Gen Intel(R) Core(TM) i9-11900H @ 2.50GHz
│ fullresp-string │ fullresp-string-concat │
│ sec/op │ sec/op vs base │
ResponseChain_FullResponse/String-16 529.3µ ± 24% 598.3µ ± 21% ~ (p=0.853 n=10)
│ fullresp-string │ fullresp-string-concat │
│ B/op │ B/op vs base │
ResponseChain_FullResponse/String-16 1.008Mi ± 0% 1.008Mi ± 0% ~ (p=0.169 n=10)
│ fullresp-string │ fullresp-string-concat │
│ allocs/op │ allocs/op vs base │
ResponseChain_FullResponse/String-16 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=10) ¹
¹ all samples are equal
$ benchstat fullresp-bytes fullresp-bytes-concat
goos: linux
goarch: amd64
pkg: github.com/projectdiscovery/utils/http
cpu: 11th Gen Intel(R) Core(TM) i9-11900H @ 2.50GHz
│ fullresp-bytes │ fullresp-bytes-concat │
│ sec/op │ sec/op vs base │
ResponseChain_FullResponse/Bytes-16 557.9µ ± 10% 564.0µ ± 21% ~ (p=1.000 n=10)
│ fullresp-bytes │ fullresp-bytes-concat │
│ B/op │ B/op vs base │
ResponseChain_FullResponse/Bytes-16 1.008Mi ± 0% 1.008Mi ± 0% ~ (p=1.000 n=10)
│ fullresp-bytes │ fullresp-bytes-concat │
│ allocs/op │ allocs/op vs base │
ResponseChain_FullResponse/Bytes-16 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=10) ¹
¹ all samples are equal |
fix(httputil): racy in
ResponseChainand optimize memory usage.
This commit addresses a critical regression and
improves memory efficiency in the
httputilpkg.Changes:
buffer that was immediately returned to the
pool, leading to data corruption
(use-after-free). It now allocates a safe,
independent copy.
Close()idempotent. CallingClose()multiple times no longer causes a panic.
FullResponseStringto use zero-copyconversion from the safe byte slice.
limitedBufferto use async.Poolfor temp 32KB (via
chunkSize=DefaultChunkSize) read chunks.Additional context:
p.s. (prior to this) behavior was observed in https://github.com/projectdiscovery/nuclei/actions/runs/19680155957/job/56400995022?pr=6629 -- that the perf-regression job hung indefinitely, so I canceled manually (after 2h+, lol, I didn't notice) -- also confirmed locally;
nuclei> make build-test; timeout 5m ./bin/nuclei.test -test.run - -test.bench=. -test.benchmem ./cmd/nuclei/.