Skip to content

Conversation

MengjinYan
Copy link
Contributor

Why are these changes needed?

This PR adds a page under observability user guide for Ray event export and task events.

It contains basic information about how to enable/configure the figure and the current format of the task events with an example.

As the feature is becoming more mature, the page should be updated with a better way to configure the feature and more explanation about the event format & examples.

Related issue number

N/A

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

Signed-off-by: Mengjin Yan <[email protected]>
@MengjinYan MengjinYan marked this pull request as ready for review September 3, 2025 19:43
@MengjinYan MengjinYan requested a review from a team as a code owner September 3, 2025 19:43
@can-anyscale can-anyscale self-requested a review September 3, 2025 19:48
Ray event export feature is still in alpha version. The way to configure event
reporting and the format of the events are subject to change.

Enable event reporting
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about having a section briefly describing the architecture of the aggregator agent and listing some env vars to configure the buffer?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is a good idea to describe the architecture. I'll add one section for that.

At the same time, I'm a little bit hesitate to expose the buffer configuration now.

The reason are that:
(1) env vars might not be a good way for the users to configure Ray and I'm planning to revamp the section after we check in the code to add the command line parameter & ray.init() parameter to configure the event export destination;
(2) we've already exposed a lot of API to the users today and we should only expose new ones if they are absolutely necessary. So without user feedback, I think we should default to expose as less APIs as possible.


Enable event reporting
----------------------
To enable event reporting, you need to set the ``RAY_enable_core_worker_ray_event_to_aggregator`` environment
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
To enable event reporting, you need to set the ``RAY_enable_core_worker_ray_event_to_aggregator`` environment
To enable event reporting, you need to set the ``RAY_enable_core_worker_ray_event_to_aggregator`` environment

Should the env var be something like RAY_enable_ray_event_export given events can be from raylet, gcs etc as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my perspective, this env var will only be temporarily used before we add the public API to ray start scrip & ray.init() and before we migrate the GCS path. With the 2 changes done, we can remove the environment variable and only use the values configured from the public API to determine whether to send the event to GCS. We will revamp the doc then as well.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it.

variable to ``1`` when starting each Ray worker node.

To set the target HTTP end point, you need to set the ``RAY_events_export_addr``
environment variable. The value should be a valid HTTP URL with the scheme of ``http``.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can it be https?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently it only supports http. I think we can add https support later.

Events are JSON objects in the POST request body.

All events contains the same base fields and different event specific fields.
See ``src/ray/protobuf/events_base_event.proto`` for the base fields.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you create a link to the file?

Comment on lines 100 to 102
"2":"2025-09-03T18:52:14.467402Z",
"1":"2025-09-03T18:52:14.467290Z",
"5":"2025-09-03T18:52:14.469074Z"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, we show 1,2,3 here instead of the state string?


For each task, Ray exports two types of events: Task Definition Event and Task Execution Event.

* Task Definition Event generated once per task attempt. It contains the metadata of the task. See ``src/ray/protobuf/events_task_definition_event.proto`` and ``src/ray/protobuf/actor_task_definition_event.proto`` for the event format for normal tasks and actor tasks respectively.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: src/ray/protobuf/public/events_task_definition_event.proto and others (src/ray/protobuf/events_task_definition_event.proto)

"taskExecutionEvent":{
"taskId":"yO9FzNARJXH///////////////8BAAAA",
"taskState":{
"2":"2025-09-03T18:52:14.467402Z",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

qq: what states do these number 2, 1, etc. map to?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They map to the TaskStatus in common.proto

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we can point to that file next to one of these numbers in a comment

variable to ``1`` when starting each Ray worker node.

To set the target HTTP end point, you need to set the ``RAY_events_export_addr``
environment variable. The value should be a valid HTTP URL with the scheme of ``http``.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i wonder if we can add some simple example on how to setup a local http endpoint for testing, etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! I think later we can add an end-to-end example for the events once the configs are finalized. Added an action item in the followup GitHub issue: #54515

MengjinYan and others added 2 commits September 3, 2025 14:02
"taskExecutionEvent":{
"taskId":"yO9FzNARJXH///////////////8BAAAA",
"taskState":{
"2":"2025-09-03T18:52:14.467402Z",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we can point to that file next to one of these numbers in a comment

@ray-gardener ray-gardener bot added docs An issue or change related to documentation core Issues that should be addressed in Ray Core labels Sep 4, 2025
@MengjinYan
Copy link
Contributor Author

@dayshah Wondering if you can help to review from doc's perspective? Thanks!


Enable event reporting
----------------------
To enable event reporting, you need to set the ``RAY_enable_core_worker_ray_event_to_aggregator`` environment
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it.

Events are JSON objects in the POST request body.

All events contains the same base fields and different event specific fields.
See `src/ray/protobuf/events_base_event.proto <https://github.com/ray-project/ray/blob/master/src/ray/protobuf/events_base_event.proto>`_ for the base fields.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@angelinalg @dstrodtman do you know how can I have the link point to the right version. Like if the doc is for 2.49 then I want to point to ray/blob/releases-2.49/xxx and if the doc is for master, then I want to point to ray/blob/master/xxx

Copy link
Contributor

@can-anyscale can-anyscale Sep 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is merged now so you can point this to the public directory as well

can-anyscale added a commit that referenced this pull request Sep 4, 2025
Move events_base_event_proto to the public proto directory. Will merge
and update doc after #56203.

Test:
- CI

Signed-off-by: Cuong Nguyen <[email protected]>
MengjinYan and others added 2 commits September 4, 2025 15:14
Signed-off-by: Mengjin Yan <[email protected]>
Ray Event Export
================

Ray supports exporting structured events to a configured HTTP endpoint. Each worker node
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

specify in the past or from what version? Sounds a little weird because the next sentence that says from 2.49.

Also link the old feature.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Will do.

generated during the task execution.
See `src/ray/protobuf/public/events_task_execution_event.proto <https://github.com/ray-project/ray/blob/master/src/ray/protobuf/public/events_task_execution_event.proto>`_ for the event format.

An example of the task events is as follows:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
An example of the task events is as follows:
An example of a Task Definition Event followed by a Task Execution Event:

? I don't know if this is right

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I mainly want to provide a Task Definition Event and a Task Execution Event. Probably I'll update the doc to that.

@MengjinYan MengjinYan requested a review from a team as a code owner September 5, 2025 21:38
sampan-s-nayak pushed a commit to sampan-s-nayak/ray that referenced this pull request Sep 8, 2025
Move events_base_event_proto to the public proto directory. Will merge
and update doc after ray-project#56203.

Test:
- CI

Signed-off-by: Cuong Nguyen <[email protected]>
Signed-off-by: sampan <[email protected]>
@jjyao jjyao added the go add ONLY when ready to merge, run all tests label Sep 8, 2025
@jjyao jjyao enabled auto-merge (squash) September 8, 2025 20:37
@jjyao jjyao merged commit 010791e into ray-project:master Sep 8, 2025
5 of 6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Issues that should be addressed in Ray Core docs An issue or change related to documentation go add ONLY when ready to merge, run all tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants