[serve] Fix buffered logging reusing request context (Fixes #55851) #56094

vaishdho1 · 2025-08-29T20:31:55Z

Why are these changes needed?

Currently, when Serve file logs are buffered via a MemoryHandler, ServeContextFilter fetches the serve request context at flush time instead of when the log record is emitted. As a result, many log records flushed together can share the same request context, breaking per request tracing.
This PR captures the request context at emit time when buffering is enabled and makes the filter idempotent so it won’t overwrite pre populated fields. This preserves correct per record context for buffered file logs without changing non buffered behavior.

Related issue number

Closes #55851

Performance Testing

Manual Verification - Benchmarked both buffered and non buffered cases with and without the fix.
Performance- Used Locust with 100 users for a duration of 3-4 mins

Without buffering:
With fix: Avg: 396.69(ms), P99: 580(ms), RPS: 228.4
Without fix: 391.29(ms), P99: 560(ms), RPS: 239

With buffering:
set RAY_SERVE_REQUEST_PATH_LOG_BUFFER_SIZE = 1000
With fix: Avg(ms): 400.83, P99(ms): 620, RPS: 230.5
Without fix: Avg(ms): 373.25, P99(ms): 610, RPS: 249.4

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
- I've added any new APIs to the API Reference. For example, if I added a
  method in Tune, I've added it in doc/source/tune/api/ under the
  corresponding .rst file.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

…ct#55851) Signed-off-by: Vaishnavi Panchavati <[email protected]>

gemini-code-assist

Code Review

This pull request effectively addresses the issue of buffered logs reusing request contexts by capturing the context at emit time. The approach of wrapping logger methods for buffered logs is sound, and making ServeContextFilter idempotent is a necessary change. My feedback includes a couple of suggestions to improve the robustness of the new wrap_logger_for_buffering function to prevent potential side effects and make its signature more explicit.

python/ray/serve/_private/logging_utils.py

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Vaishnavi Panchavati <[email protected]>

abrarsheikh · 2025-08-30T05:59:14Z

Thanks for the analysis on this.

When i compare:

Buffering with fix Avg(ms): 400.83, P99(ms): 620, RPS: 230.5
No-buffering without fix 391.29(ms), P99: 560(ms), RPS: 239

I can conclude that it's better to remove MemoryHandler than to apply this fix. The reason why MemoryHandler was added was to improve performance. Can you think of a more optimal solution, if not I suggest we drop the memory handle. @akyang-anyscale, any other ideas?

vaishdho1 · 2025-08-31T06:45:26Z

I tried running these tests 4-5 times more. I found the following stats.

Case Requests Avg P99 RPS
Buffer-original 33372 345.32. 540. 291
Buffer-Fix 31857 349.41 550 281.9
Nonbuffer-original. 38024 381. 560. 253.6
Nonbuffer-fix 31662. 376.73. 550. 249.7

For different runs the latency slightly differs so these were the best figures after running it 4-5 times.

There is another method I can think of for this. We can add the ServeContextFilter() to the memory handler for the buffered case instead of adding it to the file handler directly. This will take care of adding context in the buffered case when the filter is added to the memory handler. But I need to check latency here.
Another method is to use the Log Record factory and add the context to the log record. This logic is already implemented inside https://github.com/ray-project/ray/blob/master/python/ray/_private/log.py#L71 for adding custom time to all logs. Something similar can be used inside serve for context. But, I am not sure how this will effect the flow. I need to look at this.

I also have a couple of questions here

The example I checked is a very small application with just 2 deployments so if there is a bigger application with more deployments and more logging, the latency might increase without buffering right? So the memory handler can help in this case.
I also wanted to know what is the threshold for the different parameters after which we decide the latency is large?

vaishdho1 · 2025-09-02T18:15:09Z

I have analyzed adding ServeContextFilter directly to the memory handler.
In this case, the filter is added to the memory handler directly instead of the file handler since the file handler should always move through the memory handler for both buffered and non buffered cases.

I have removed adding the filter to the file handler

if(
     logging_config.enable_access_log or
     RAY_SERVE_ENABLE_JSON_LOGGING or 
     logging_config.encoding == EncodingType.JSON
   ):
      memory_handler.addFilter(ServeContextFilter())

I feel this method is more robust since we are not explicitly adding wrappers around specific logging levels.

The latency comparison is shown below:

Method                          Average latency       P99        RPS
Original+buffering                324.68              550         280.9    
Original+ Nobuffering             373.72              550         250.9
Fix+ buffering                      330               530         307.6
Fix+ No buffering                  359.83             540         272.4

python/ray/serve/_private/logging_utils.py

Signed-off-by: Vaishnavi Panchavati <[email protected]>

vaishdho1 · 2025-09-04T01:22:39Z

Fixed the code with the changes and benchmarked the results here.

Method.             Requests      Avg      P99      RPS
Buffering+fix         25643       269.8      440      350.1
Nobuffering+fix       24844       308        460      305.4
Buffering+original    25582       271.6      460      335
Nobuffering+original  24818       303.94     470     303.94

abrarsheikh · 2025-09-04T01:36:36Z

Any explanation for why Buffering+fix performs better?

Let's add a test in test_logging_utils.py to make sure when buffering is used request_id is not duplicated.

vaishdho1 · 2025-09-04T18:17:43Z

The performance of buffer+original and buffer+fix is almost similar. Sometimes one performs better than the other but they are very close. I don't see a concrete reason behind the difference because they are effectively doing the same thing but in a different order.

For the test case, I am thinking of implementing a small deployment with logging and sending requests(>buffer_Size) with buffering enabled. I will then count the occurrences of replica_ids in the logs(system and application) generated?
This would be sufficient for the use case right?
I can add this directly under ray/serve/tests/test_logging.py

Signed-off-by: Vaishnavi Panchavati <[email protected]>

vaishdho1 · 2025-09-04T22:52:20Z

Added a test which checks reuse of request ids for buffering case

abrarsheikh · 2025-09-04T22:54:01Z

python/ray/serve/tests/test_logging.py

+                    request_id = log_entry.get("request_id", None)
+                    if request_id:
+                        log_request_ids.append(request_id)
+                # Verify no excessive duplication


Why tolerate any duplication?

So the logs also print each log message here.Since I have 2 log messages from the application and one access log message for each ID we get about 3 everytime. This buffering is only seen with application logs

=== Request ID: d5587f91-88db-4c5b-95aa-a7139c081ee1 (appears 3 times) === 1 Additional log entry 1 GET / 200 0.5ms 1 Processing request === Request ID: d9c323cb-67f0-4de1-8658-8c43091dca18 (appears 3 times) === 1 Additional log entry 1 GET / 200 0.7ms 1 Processing request === Request ID: da5a5dac-c07e-4522-9b22-bbc7a91fe95b (appears 3 times) === 1 Additional log entry 1 GET / 200 0.6ms 1 Processing request === Request ID: dd74bfc1-9589-4835-ab23-cb5f4d5d8f2c (appears 3 times) === 1 Additional log entry 1 GET / 200 0.6ms 1 Processing request

gotcha, then it's better to assert

the exact count instead of assert count <= 5

checks to ensure that each of 3 log messages has unique contents.

this will make the tests more robust and easier to read

abrarsheikh · 2025-09-04T22:54:31Z

python/ray/serve/tests/test_logging.py

+
+    serve.run(TestApp.bind())
+
+    for _ in range(150 if buffer_size > 1 else 20):


use the same number of requests regardless, I see no harm

I will fix this to a single number.

Signed-off-by: Vaishnavi Panchavati <[email protected]>

abrarsheikh · 2025-09-05T00:53:43Z

python/ray/serve/tests/test_logging.py

+def test_request_id_uniqueness_with_buffering(buffer_size, monkeypatch):
+    """Test request IDs are unique when buffering is enabled."""
+
+    monkeypatch.setenv("RAY_SERVE_REQUEST_PATH_LOG_BUFFER_SIZE", str(buffer_size))


i dont think setting the env variable here works, you should be able to verify this by logging the buffer size inside serve.
Essentially setting env variable after serve has already started has no effect.
See test_configure_default_serve_logger_with_stderr_redirect on how to set the env variable correctly. Something like

@pytest.mark.parametrize( "ray_instance", [ {"RAY_SERVE_REQUEST_PATH_LOG_BUFFER_SIZE": "100"}, ], indirect=True, )

[serve] Fix buffered logging reusing request context (Fixes ray-proje…

36cb30f

…ct#55851) Signed-off-by: Vaishnavi Panchavati <[email protected]>

vaishdho1 requested a review from a team as a code owner August 29, 2025 20:31

Merge branch 'master' into serve-logging-buffer-fix

09d05ca

gemini-code-assist bot reviewed Aug 29, 2025

View reviewed changes

python/ray/serve/_private/logging_utils.py Outdated Show resolved Hide resolved

python/ray/serve/_private/logging_utils.py Outdated Show resolved Hide resolved

Update python/ray/serve/_private/logging_utils.py

8b1a392

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Vaishnavi Panchavati <[email protected]>

ray-gardener bot added serve Ray Serve Related Issue community-contribution Contributed by the community labels Aug 30, 2025

abrarsheikh reviewed Sep 3, 2025

View reviewed changes

python/ray/serve/_private/logging_utils.py Outdated Show resolved Hide resolved

[serve] Moved file_handler filters into MemoryHandler

bd281e4

Signed-off-by: Vaishnavi Panchavati <[email protected]>

[serve] Request id uniqueness test for buffered logs

40fb04e

Signed-off-by: Vaishnavi Panchavati <[email protected]>

abrarsheikh reviewed Sep 4, 2025

View reviewed changes

abrarsheikh requested a review from akyang-anyscale September 4, 2025 23:00

[serve] Fixed the requests loop with a single value

cbff0f5

Signed-off-by: Vaishnavi Panchavati <[email protected]>

abrarsheikh reviewed Sep 5, 2025

View reviewed changes


		serve.run(TestApp.bind())

		for _ in range(150 if buffer_size > 1 else 20):

[serve] Fix buffered logging reusing request context (Fixes #55851) #56094

Are you sure you want to change the base?

[serve] Fix buffered logging reusing request context (Fixes #55851) #56094

Uh oh!

Conversation

vaishdho1 commented Aug 29, 2025

Why are these changes needed?

Related issue number

Performance Testing

Checks

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

abrarsheikh commented Aug 30, 2025

Uh oh!

vaishdho1 commented Aug 31, 2025

Uh oh!

vaishdho1 commented Sep 2, 2025

Uh oh!

Uh oh!

vaishdho1 commented Sep 4, 2025

Uh oh!

abrarsheikh commented Sep 4, 2025

Uh oh!

vaishdho1 commented Sep 4, 2025

Uh oh!

vaishdho1 commented Sep 4, 2025

Uh oh!

abrarsheikh Sep 4, 2025

Choose a reason for hiding this comment

Uh oh!

vaishdho1 Sep 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

abrarsheikh Sep 5, 2025

Choose a reason for hiding this comment

Uh oh!

abrarsheikh Sep 4, 2025

Choose a reason for hiding this comment

Uh oh!

vaishdho1 Sep 4, 2025

Choose a reason for hiding this comment

Uh oh!

abrarsheikh Sep 5, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

vaishdho1 Sep 4, 2025 •

edited

Loading