Skip to content

feat: metrics #629

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 11 commits into
base: main
Choose a base branch
from
Open

feat: metrics #629

wants to merge 11 commits into from

Conversation

anunaym14
Copy link
Member

No description provided.

@anunaym14 anunaym14 requested a review from theomonnom April 30, 2025 09:43
Copy link
Contributor

ilo-nanpa bot commented Apr 30, 2025

it seems like you haven't added any nanpa changeset files to this PR.

if this pull request includes changes to code, make sure to add a changeset, by writing a file to .nanpa/<unique-name>.kdl:

minor type="added" "Introduce frobnication algorithm"

refer to the manpage for more information.

@anunaym14 anunaym14 changed the title Feat/metrics feat: metrics Apr 30, 2025
Copy link
Member

@theomonnom theomonnom May 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not copy this file, it is going to be very hard to keep it up to date since it's a copy (we will also have to release a new FFI versions each time we add a metric).

Instead, let's build the livekit metrics protocol on the Python side and serialize it before sending it to the FFI. The FFI is just going to be a passthrough for sending it over the DC.

You can take this example on how you can send raw bytes from Python:

required uint64 data_ptr = 2;

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, please check.

Comment on lines 310 to 314
required uint64 local_participant_handle = 1;
repeated string str_data = 2;
repeated TimeSeriesMetric time_series = 3;
repeated EventMetric events = 4;
optional uint64 async_id = 5;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will then become

Suggested change
required uint64 local_participant_handle = 1;
repeated string str_data = 2;
repeated TimeSeriesMetric time_series = 3;
repeated EventMetric events = 4;
optional uint64 async_id = 5;
required uint64 local_participant_handle = 1;
required uint64 data_ptr = 2;
required uint64 data_len = 3;
optional uint64 async_id = 4;

..Default::default()
};

let _ = rtc_engine.publish_data(data_packet, DataPacketKind::Reliable).await;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't ignore errors, this function should return a Result

Copy link
Member Author

@anunaym14 anunaym14 May 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Handled, please check again

) {
cancellation_token
.run_until_cancelled(async move {
std::thread::sleep(Duration::from_secs(1));
Copy link
Member

@theomonnom theomonnom May 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is blocking the event loop, use a tokio sleep instead or an async interval instead

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

pub struct Room {
inner: Arc<RoomSession>,
cancellation_token: tokio_util::sync::CancellationToken
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can avoid the cancellation token. This seems unnecessary.

We can just keep track of the handle returned by tokio::spawn and await it on close.

To "close/cancel" the task, you can use a tokio::select pattern

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made it better, removed cancellation token, please have another look.

&self,
room: &Room,
rtc_engine: &RtcEngine,
cancellation_token: CancellationToken,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cancellation can be done beforehand directly in the task inside Room using tokio::select!

This function should be cancel safe:
https://docs.rs/tokio/latest/tokio/macro.select.html#cancellation-safety

room.local_participant().identity().into(),
);

let _ = self.send_metrics(rtc_engine, strings, subscriber_ts_metrics).await;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, let's return a Result

fn create_time_series(
&self,
label: MetricLabel,
strings: &mut Vec<String>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason why this need to be a mutable reference? It seems like the "caller" doesn't expect it? So maybe it's fine to clone inside this method and keep the modifications in this scope

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes actually:

room.local_participant().identity().into(),
);

let _ = self.send_metrics(rtc_engine, strings, publisher_ts_metrics).await;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, let's return a Result

@@ -20,6 +20,8 @@ OUT_RUST=src
protoc \
-I=$PROTOCOL \
--prost_out=$OUT_RUST \
--prost_opt=compile_well_known_types \
--prost_opt=extern_path=.google.protobuf=::pbjson_types \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need json types? Seems like we never serialize to json?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not anymore, livekit-protocol was converting .google.protobuf types to pbjson_types but livekit-ffi was not doing so, it was messing up timestamps in metrics.proto. Since we have removed metrics proto from ffi, we don't need this anymore.

@anunaym14 anunaym14 marked this pull request as ready for review May 12, 2025 08:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants