Skip to content

Conversation

@dwisiswant0
Copy link
Member

fix(httputil): racy in ResponseChain

and optimize memory usage.

This commit addresses a critical regression and
improves memory efficiency in the httputil pkg.

Changes:

  • Previously, it returned a slice referencing a
    buffer that was immediately returned to the
    pool, leading to data corruption
    (use-after-free). It now allocates a safe,
    independent copy.
  • Make Close() idempotent. Calling Close()
    multiple times no longer causes a panic.
  • Optimize FullResponseString to use zero-copy
    conversion from the safe byte slice.
  • Optimize limitedBuffer to use a sync.Pool
    for temp 32KB (via chunkSize =
    DefaultChunkSize) read chunks.

Additional context:

p.s. (prior to this) behavior was observed in https://github.com/projectdiscovery/nuclei/actions/runs/19680155957/job/56400995022?pr=6629 -- that the perf-regression job hung indefinitely, so I canceled manually (after 2h+, lol, I didn't notice) -- also confirmed locally; nuclei> make build-test; timeout 5m ./bin/nuclei.test -test.run - -test.bench=. -test.benchmem ./cmd/nuclei/.

and optimize memory usage.

This commit addresses a critical regression and
improves memory efficiency in the `httputil` pkg.

Changes:
* Previously, it returned a slice referencing a
  buffer that was immediately returned to the
  pool, leading to data corruption
  (use-after-free). It now allocates a safe,
  independent copy.
* Make `Close()` idempotent. Calling `Close()`
  multiple times no longer causes a panic.
* Optimize `FullResponseString` to use zero-copy
  conversion from the safe byte slice.
* Optimize `limitedBuffer` to use a `sync.Pool`
  for temp 32KB (via `chunkSize` =
  `DefaultChunkSize`) read chunks.

Signed-off-by: Dwi Siswanto <[email protected]>
Comment on lines -308 to 309
// The returned slice is valid only until Close() is called.
// Note: This creates a new buffer internally which is returned to the pool.
// The returned slice is a copy and remains valid even after Close() is called.
func (r *ResponseChain) FullResponseBytes() []byte {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here (and (*ResponseChain).FullResponseString()), the inconsistency is intentional -- those are now a "safe copy". The previous implementation (#700, yeah, my bad) failed to manage the lifecycle correctly (returning a ptr to a buffer that was immediately freed), which caused the race condition.

And since (*ResponseChain).fullResponse field was removed and headers+body are now stored in separate buffers, we must (forced anyway) alloc new memory to concat them -- rather than returning a fragile view into a temp pooled buffer (which is racy).

tl;dr: (*ResponseChain).FullResponse* is now a copy because physics demands it (non-contiguous source data).

Copy link
Member Author

@dwisiswant0 dwisiswant0 Nov 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we retract v0.7.0, @Mzack9999?

EDIT:

I don’t think we need to retract that since the issues are in the new API(s) anyway.

@dwisiswant0
Copy link
Member Author

dwisiswant0 commented Nov 26, 2025

Bench:

func BenchmarkResponseChain_FullResponse(b *testing.B) {
	body := bytes.Repeat([]byte("B"), 1024*1024) // 1MB
	resp := &http.Response{
		StatusCode: 200,
		Status:     "200 OK",
		Body:       io.NopCloser(bytes.NewReader(body)),
		Header: http.Header{
			"Content-Type": []string{"text/plain"},
			"Server":       []string{"Benchmark"},
		},
		Proto:      "HTTP/1.1",
		ProtoMajor: 1,
		ProtoMinor: 1,
	}

	rc := NewResponseChain(resp, -1)
	_ = rc.Fill()
	defer rc.Close()

	b.Run("String", func(b *testing.B) {
		b.ReportAllocs()
		for b.Loop() {
			_ = rc.FullResponseString()
		}
	})

	b.Run("String-Concat", func(b *testing.B) {
		b.ReportAllocs()
		for b.Loop() {
			_ = rc.HeadersString() + rc.BodyString()
		}
	})

	b.Run("Bytes", func(b *testing.B) {
		b.ReportAllocs()
		for b.Loop() {
			_ = rc.FullResponseBytes()
		}
	})

	b.Run("Bytes-Concat", func(b *testing.B) {
		b.ReportAllocs()
		for b.Loop() {
			h := rc.HeadersBytes()
			bo := rc.BodyBytes()
			buf := make([]byte, len(h)+len(bo))
			copy(buf, h)
			copy(buf[len(h):], bo)
		}
	})
}
$ go test -run - -bench=BenchmarkResponseChain_FullResponse -benchmem -count 10 ./http | tee /tmp/fullresp
$ cat /tmp/fullresp | grep "/String-16" | tee /tmp/fullresp-string
$ cat /tmp/fullresp | grep "/Bytes-16" | tee /tmp/fullresp-bytes
$ cat /tmp/fullresp | grep "/String-Concat-16" | sed "s/Concat\-//g" | tee /tmp/fullresp-string-concat
$ cat /tmp/fullresp | grep "/Bytes-Concat-16" | sed "s/Concat\-//g" | tee /tmp/fullresp-bytes-concat
$ benchstat fullresp-string fullresp-string-concat
goos: linux
goarch: amd64
pkg: github.com/projectdiscovery/utils/http
cpu: 11th Gen Intel(R) Core(TM) i9-11900H @ 2.50GHz
                                     │ fullresp-string │     fullresp-string-concat     │
                                     │     sec/op      │    sec/op     vs base          │
ResponseChain_FullResponse/String-16      529.3µ ± 24%   598.3µ ± 21%  ~ (p=0.853 n=10)

                                     │ fullresp-string │     fullresp-string-concat     │
                                     │      B/op       │     B/op      vs base          │
ResponseChain_FullResponse/String-16      1.008Mi ± 0%   1.008Mi ± 0%  ~ (p=0.169 n=10)

                                     │ fullresp-string │     fullresp-string-concat     │
                                     │    allocs/op    │ allocs/op   vs base            │
ResponseChain_FullResponse/String-16        1.000 ± 0%   1.000 ± 0%  ~ (p=1.000 n=10) ¹
¹ all samples are equal
$ benchstat fullresp-bytes fullresp-bytes-concat
goos: linux
goarch: amd64
pkg: github.com/projectdiscovery/utils/http
cpu: 11th Gen Intel(R) Core(TM) i9-11900H @ 2.50GHz
                                    │ fullresp-bytes │     fullresp-bytes-concat      │
                                    │     sec/op     │    sec/op     vs base          │
ResponseChain_FullResponse/Bytes-16     557.9µ ± 10%   564.0µ ± 21%  ~ (p=1.000 n=10)

                                    │ fullresp-bytes │     fullresp-bytes-concat      │
                                    │      B/op      │     B/op      vs base          │
ResponseChain_FullResponse/Bytes-16     1.008Mi ± 0%   1.008Mi ± 0%  ~ (p=1.000 n=10)

                                    │ fullresp-bytes │     fullresp-bytes-concat      │
                                    │   allocs/op    │ allocs/op   vs base            │
ResponseChain_FullResponse/Bytes-16       1.000 ± 0%   1.000 ± 0%  ~ (p=1.000 n=10) ¹
¹ all samples are equal

@Mzack9999 Mzack9999 merged commit c1bbf17 into main Nov 28, 2025
7 checks passed
@Mzack9999 Mzack9999 deleted the dwisiswant0/fix/httputil/racy-in-ResponseChain branch November 28, 2025 14:16
@Mzack9999 Mzack9999 linked an issue Nov 28, 2025 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

httputil: FullResponse* returns pooled buffer

3 participants