-
Notifications
You must be signed in to change notification settings - Fork 6.8k
[serve] Fix buffered logging reusing request context (Fixes #55851) #56094
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
…ct#55851) Signed-off-by: Vaishnavi Panchavati <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request effectively addresses the issue of buffered logs reusing request contexts by capturing the context at emit time. The approach of wrapping logger methods for buffered logs is sound, and making ServeContextFilter
idempotent is a necessary change. My feedback includes a couple of suggestions to improve the robustness of the new wrap_logger_for_buffering
function to prevent potential side effects and make its signature more explicit.
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Vaishnavi Panchavati <[email protected]>
Thanks for the analysis on this. When i compare:
I can conclude that it's better to remove MemoryHandler than to apply this fix. The reason why MemoryHandler was added was to improve performance. Can you think of a more optimal solution, if not I suggest we drop the memory handle. @akyang-anyscale, any other ideas? |
I tried running these tests 4-5 times more. I found the following stats.
For different runs the latency slightly differs so these were the best figures after running it 4-5 times.
I also have a couple of questions here
|
I have analyzed adding I have removed adding the filter to the file handler
I feel this method is more robust since we are not explicitly adding wrappers around specific logging levels. The latency comparison is shown below:
|
Signed-off-by: Vaishnavi Panchavati <[email protected]>
Fixed the code with the changes and benchmarked the results here.
|
Any explanation for why Let's add a test in |
The performance of buffer+original and buffer+fix is almost similar. Sometimes one performs better than the other but they are very close. I don't see a concrete reason behind the difference because they are effectively doing the same thing but in a different order. For the test case, I am thinking of implementing a small deployment with logging and sending requests(>buffer_Size) with buffering enabled. I will then count the occurrences of replica_ids in the logs(system and application) generated? |
Signed-off-by: Vaishnavi Panchavati <[email protected]>
Added a test which checks reuse of request ids for buffering case |
request_id = log_entry.get("request_id", None) | ||
if request_id: | ||
log_request_ids.append(request_id) | ||
# Verify no excessive duplication |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why tolerate any duplication?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the logs also print each log message here.Since I have 2 log messages from the application and one access log message for each ID we get about 3 everytime. This buffering is only seen with application logs
=== Request ID: d5587f91-88db-4c5b-95aa-a7139c081ee1 (appears 3 times) ===
1 Additional log entry
1 GET / 200 0.5ms
1 Processing request
=== Request ID: d9c323cb-67f0-4de1-8658-8c43091dca18 (appears 3 times) ===
1 Additional log entry
1 GET / 200 0.7ms
1 Processing request
=== Request ID: da5a5dac-c07e-4522-9b22-bbc7a91fe95b (appears 3 times) ===
1 Additional log entry
1 GET / 200 0.6ms
1 Processing request
=== Request ID: dd74bfc1-9589-4835-ab23-cb5f4d5d8f2c (appears 3 times) ===
1 Additional log entry
1 GET / 200 0.6ms
1 Processing request
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gotcha, then it's better to assert
- the exact count instead of
assert count <= 5
- checks to ensure that each of 3 log messages has unique contents.
this will make the tests more robust and easier to read
|
||
serve.run(TestApp.bind()) | ||
|
||
for _ in range(150 if buffer_size > 1 else 20): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use the same number of requests regardless, I see no harm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will fix this to a single number.
Signed-off-by: Vaishnavi Panchavati <[email protected]>
def test_request_id_uniqueness_with_buffering(buffer_size, monkeypatch): | ||
"""Test request IDs are unique when buffering is enabled.""" | ||
|
||
monkeypatch.setenv("RAY_SERVE_REQUEST_PATH_LOG_BUFFER_SIZE", str(buffer_size)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i dont think setting the env variable here works, you should be able to verify this by logging the buffer size inside serve.
Essentially setting env variable after serve has already started has no effect.
See test_configure_default_serve_logger_with_stderr_redirect
on how to set the env variable correctly. Something like
@pytest.mark.parametrize(
"ray_instance",
[
{"RAY_SERVE_REQUEST_PATH_LOG_BUFFER_SIZE": "100"},
],
indirect=True,
)
Why are these changes needed?
Currently, when Serve file logs are buffered via a
MemoryHandler
,ServeContextFilter
fetches the serve request context at flush time instead of when the log record is emitted. As a result, many log records flushed together can share the same request context, breaking per request tracing.This PR captures the request context at emit time when buffering is enabled and makes the filter idempotent so it won’t overwrite pre populated fields. This preserves correct per record context for buffered file logs without changing non buffered behavior.
Related issue number
Closes #55851
Performance Testing
Manual Verification - Benchmarked both buffered and non buffered cases with and without the fix.
Performance- Used Locust with 100 users for a duration of 3-4 mins
Without buffering:
With fix:
Avg: 396.69(ms), P99: 580(ms), RPS: 228.4
Without fix:
391.29(ms), P99: 560(ms), RPS: 239
With buffering:
set
RAY_SERVE_REQUEST_PATH_LOG_BUFFER_SIZE
= 1000With fix:
Avg(ms): 400.83, P99(ms): 620, RPS: 230.5
Without fix:
Avg(ms): 373.25, P99(ms): 610, RPS: 249.4
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.