-
Notifications
You must be signed in to change notification settings - Fork 6.8k
[Core] [Doc] Add OSS Document for Task Events #56203
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Mengjin Yan <[email protected]>
Ray event export feature is still in alpha version. The way to configure event | ||
reporting and the format of the events are subject to change. | ||
|
||
Enable event reporting |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about having a section briefly describing the architecture of the aggregator agent and listing some env vars to configure the buffer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is a good idea to describe the architecture. I'll add one section for that.
At the same time, I'm a little bit hesitate to expose the buffer configuration now.
The reason are that:
(1) env vars might not be a good way for the users to configure Ray and I'm planning to revamp the section after we check in the code to add the command line parameter & ray.init() parameter to configure the event export destination;
(2) we've already exposed a lot of API to the users today and we should only expose new ones if they are absolutely necessary. So without user feedback, I think we should default to expose as less APIs as possible.
|
||
Enable event reporting | ||
---------------------- | ||
To enable event reporting, you need to set the ``RAY_enable_core_worker_ray_event_to_aggregator`` environment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To enable event reporting, you need to set the ``RAY_enable_core_worker_ray_event_to_aggregator`` environment | |
To enable event reporting, you need to set the ``RAY_enable_core_worker_ray_event_to_aggregator`` environment |
Should the env var be something like RAY_enable_ray_event_export
given events can be from raylet, gcs etc as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my perspective, this env var will only be temporarily used before we add the public API to ray start scrip & ray.init() and before we migrate the GCS path. With the 2 changes done, we can remove the environment variable and only use the values configured from the public API to determine whether to send the event to GCS. We will revamp the doc then as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it.
variable to ``1`` when starting each Ray worker node. | ||
|
||
To set the target HTTP end point, you need to set the ``RAY_events_export_addr`` | ||
environment variable. The value should be a valid HTTP URL with the scheme of ``http``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can it be https?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently it only supports http
. I think we can add https
support later.
Events are JSON objects in the POST request body. | ||
|
||
All events contains the same base fields and different event specific fields. | ||
See ``src/ray/protobuf/events_base_event.proto`` for the base fields. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you create a link to the file?
"2":"2025-09-03T18:52:14.467402Z", | ||
"1":"2025-09-03T18:52:14.467290Z", | ||
"5":"2025-09-03T18:52:14.469074Z" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, we show 1,2,3 here instead of the state string?
|
||
For each task, Ray exports two types of events: Task Definition Event and Task Execution Event. | ||
|
||
* Task Definition Event generated once per task attempt. It contains the metadata of the task. See ``src/ray/protobuf/events_task_definition_event.proto`` and ``src/ray/protobuf/actor_task_definition_event.proto`` for the event format for normal tasks and actor tasks respectively. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: src/ray/protobuf/public/events_task_definition_event.proto and others (src/ray/protobuf/events_task_definition_event.proto)
"taskExecutionEvent":{ | ||
"taskId":"yO9FzNARJXH///////////////8BAAAA", | ||
"taskState":{ | ||
"2":"2025-09-03T18:52:14.467402Z", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
qq: what states do these number 2
, 1
, etc. map to?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They map to the TaskStatus in common.proto
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we can point to that file next to one of these numbers in a comment
variable to ``1`` when starting each Ray worker node. | ||
|
||
To set the target HTTP end point, you need to set the ``RAY_events_export_addr`` | ||
environment variable. The value should be a valid HTTP URL with the scheme of ``http``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i wonder if we can add some simple example on how to setup a local http endpoint for testing, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point! I think later we can add an end-to-end example for the events once the configs are finalized. Added an action item in the followup GitHub issue: #54515
Co-authored-by: Jiajun Yao <[email protected]> Signed-off-by: Mengjin Yan <[email protected]>
Signed-off-by: Mengjin Yan <[email protected]>
"taskExecutionEvent":{ | ||
"taskId":"yO9FzNARJXH///////////////8BAAAA", | ||
"taskState":{ | ||
"2":"2025-09-03T18:52:14.467402Z", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we can point to that file next to one of these numbers in a comment
@dayshah Wondering if you can help to review from doc's perspective? Thanks! |
Signed-off-by: Mengjin Yan <[email protected]>
|
||
Enable event reporting | ||
---------------------- | ||
To enable event reporting, you need to set the ``RAY_enable_core_worker_ray_event_to_aggregator`` environment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it.
Events are JSON objects in the POST request body. | ||
|
||
All events contains the same base fields and different event specific fields. | ||
See `src/ray/protobuf/events_base_event.proto <https://github.com/ray-project/ray/blob/master/src/ray/protobuf/events_base_event.proto>`_ for the base fields. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@angelinalg @dstrodtman do you know how can I have the link point to the right version. Like if the doc is for 2.49 then I want to point to ray/blob/releases-2.49/xxx
and if the doc is for master, then I want to point to ray/blob/master/xxx
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is merged now so you can point this to the public directory as well
Move events_base_event_proto to the public proto directory. Will merge and update doc after #56203. Test: - CI Signed-off-by: Cuong Nguyen <[email protected]>
Co-authored-by: Jiajun Yao <[email protected]> Signed-off-by: Mengjin Yan <[email protected]>
Signed-off-by: Mengjin Yan <[email protected]>
Ray Event Export | ||
================ | ||
|
||
Ray supports exporting structured events to a configured HTTP endpoint. Each worker node |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
specify in the past or from what version? Sounds a little weird because the next sentence that says from 2.49.
Also link the old feature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Will do.
generated during the task execution. | ||
See `src/ray/protobuf/public/events_task_execution_event.proto <https://github.com/ray-project/ray/blob/master/src/ray/protobuf/public/events_task_execution_event.proto>`_ for the event format. | ||
|
||
An example of the task events is as follows: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An example of the task events is as follows: | |
An example of a Task Definition Event followed by a Task Execution Event: |
? I don't know if this is right
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah I mainly want to provide a Task Definition Event and a Task Execution Event. Probably I'll update the doc to that.
Co-authored-by: Dhyey Shah <[email protected]> Signed-off-by: Mengjin Yan <[email protected]>
Co-authored-by: Dhyey Shah <[email protected]> Signed-off-by: Mengjin Yan <[email protected]>
Co-authored-by: Dhyey Shah <[email protected]> Signed-off-by: Mengjin Yan <[email protected]>
Co-authored-by: Dhyey Shah <[email protected]> Signed-off-by: Mengjin Yan <[email protected]>
Co-authored-by: Dhyey Shah <[email protected]> Signed-off-by: Mengjin Yan <[email protected]>
Co-authored-by: Dhyey Shah <[email protected]> Signed-off-by: Mengjin Yan <[email protected]>
Co-authored-by: Dhyey Shah <[email protected]> Signed-off-by: Mengjin Yan <[email protected]>
Signed-off-by: Mengjin Yan <[email protected]>
Move events_base_event_proto to the public proto directory. Will merge and update doc after ray-project#56203. Test: - CI Signed-off-by: Cuong Nguyen <[email protected]> Signed-off-by: sampan <[email protected]>
Co-authored-by: Dhyey Shah <[email protected]> Signed-off-by: Mengjin Yan <[email protected]>
Co-authored-by: Dhyey Shah <[email protected]> Signed-off-by: Mengjin Yan <[email protected]>
Why are these changes needed?
This PR adds a page under observability user guide for Ray event export and task events.
It contains basic information about how to enable/configure the figure and the current format of the task events with an example.
As the feature is becoming more mature, the page should be updated with a better way to configure the feature and more explanation about the event format & examples.
Related issue number
N/A
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.