Implement metrics using otel #1885

MarcusSorealheis · 2025-07-30T12:46:32Z

Description

This PR is a continuation of this cache metrics effort, consolidating cache and execution into one PR: #1804

It still may not be adopted.

Fixes # (issue)

Type of change

Please delete options that aren't relevant.

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to
not work as expected)
This change requires a documentation update

How Has This Been Tested?

Please also list any relevant details for your test configuration

Checklist

Updated documentation if needed
Tests added/amended
bazel test //... passes locally
PR is contained in a single commit, using git amend see some docs

This change is

nativelink-scheduler/src/memory_awaited_action_db.rs

chrisstaite-menlo · 2025-09-05T13:21:04Z

nativelink-scheduler/src/memory_awaited_action_db.rs line 649 at r1 (raw file):

                // Update active count for old stage
                let old_stage_attrs = match old_stage {

I feel like it would be much nicer to have something like:

let old_stage_attrs = vec![opentelemetry::KeyValue::new(
                        nativelink_util::metrics::EXECUTION_STAGE,
                        old_stage.into(),
                    )]

And then impl From<ActionStage> for ExecutionStage if it's not already been done.

chrisstaite-menlo

Reviewable status: 0 of 1 LGTMs obtained, and 0 of 5 files reviewed, and 3 discussions need to be resolved

nativelink-scheduler/src/memory_awaited_action_db.rs line 704 at r1 (raw file):

                // Record completion metrics
                if let ActionStage::Completed(action_result) = new_stage {
                    let result_attrs = if action_result.exit_code == 0 {

let result_attrs = vec![opentelemetry::KeyValue::new(
                            nativelink_util::metrics::EXECUTION_RESULT,
                            if action_result.exit_code == 0 { ExecutionResult::Success} else { ExecutionResult::Failure },
                        )]

MarcusSorealheis

Reviewable status: 0 of 1 LGTMs obtained, and 0 of 5 files reviewed, and 3 discussions need to be resolved

nativelink-scheduler/src/memory_awaited_action_db.rs line 649 at r1 (raw file):

Previously, chrisstaite-menlo (Chris Staite) wrote…

I feel like it would be much nicer to have something like:

let old_stage_attrs = vec![opentelemetry::KeyValue::new(
                        nativelink_util::metrics::EXECUTION_STAGE,
                        old_stage.into(),
                    )]

And then impl From<ActionStage> for ExecutionStage if it's not already been done.

I

nativelink-scheduler/src/memory_awaited_action_db.rs line 704 at r1 (raw file):

Previously, chrisstaite-menlo (Chris Staite) wrote…

let result_attrs = vec![opentelemetry::KeyValue::new(
                            nativelink_util::metrics::EXECUTION_RESULT,
                            if action_result.exit_code == 0 { ExecutionResult::Success} else { ExecutionResult::Failure },
                        )]

I've also finally fixed this one. Except, the CompletedFromCache case remains separate since it uses a different condition (checking the ActionStage variant rather than exit code) and maps to ExecutionResult::CacheHit.

nativelink-scheduler/src/memory_awaited_action_db.rs

MarcusSorealheis · 2025-09-06T10:36:06Z

I'm working on the docs. Should be done sometime tonight.

MarcusSorealheis · 2025-09-06T10:36:15Z

or tomorrow

MarcusSorealheis · 2025-09-06T14:09:52Z

To check the docs you can run:

cd web/platform
bun setup
bun docs
rm -r dist && bun run build
bun preview

It's a lot of content to review but should help people get started. It's only a basic setup. There's a lot more you can do with the dashboard configs, metrics server/backend (disallowed by our linter) configs, etc.

chrisstaite-menlo

@chrisstaite-menlo reviewed 4 of 5 files at r1, 1 of 2 files at r4, 1 of 1 files at r6.
Reviewable status: 0 of 1 LGTMs obtained, and 3 of 17 files reviewed, and 1 discussions need to be resolved

MarcusSorealheis had a problem deploying to production July 30, 2025 12:46 — with GitHub Actions Error

MarcusSorealheis temporarily deployed to production July 30, 2025 12:47 — with GitHub Actions Inactive

MarcusSorealheis temporarily deployed to production July 30, 2025 15:58 — with GitHub Actions Inactive

MarcusSorealheis marked this pull request as draft August 1, 2025 18:17

MarcusSorealheis temporarily deployed to production August 13, 2025 05:20 — with GitHub Actions Inactive

MarcusSorealheis had a problem deploying to production August 13, 2025 05:20 — with GitHub Actions Failure

MarcusSorealheis temporarily deployed to production August 13, 2025 05:20 — with GitHub Actions Inactive

MarcusSorealheis temporarily deployed to production August 14, 2025 14:29 — with GitHub Actions Inactive

MarcusSorealheis temporarily deployed to production August 14, 2025 15:33 — with GitHub Actions Inactive

MarcusSorealheis requested review from palfrey, amankrx and chrisstaite-menlo August 14, 2025 15:52

MarcusSorealheis marked this pull request as ready for review August 14, 2025 15:53

palfrey reviewed Aug 20, 2025

View reviewed changes

nativelink-scheduler/src/memory_awaited_action_db.rs Show resolved Hide resolved

chrisstaite-menlo requested changes Sep 5, 2025

View reviewed changes

MarcusSorealheis added 2 commits September 6, 2025 16:57

Implement metrics using otel

8c9fc22

from<ActionStage> trait

5ae7d49

MarcusSorealheis force-pushed the implement-remote-execution-metrics branch from 903b575 to 5ae7d49 Compare September 6, 2025 08:05

MarcusSorealheis had a problem deploying to production September 6, 2025 08:05 — with GitHub Actions Error

MarcusSorealheis temporarily deployed to production September 6, 2025 08:05 — with GitHub Actions Inactive

MarcusSorealheis had a problem deploying to production September 6, 2025 08:05 — with GitHub Actions Error

add tests and refactor expensive clone

2b38534

MarcusSorealheis temporarily deployed to production September 6, 2025 08:14 — with GitHub Actions Inactive

MarcusSorealheis had a problem deploying to production September 6, 2025 08:19 — with GitHub Actions Error

MarcusSorealheis temporarily deployed to production September 6, 2025 08:19 — with GitHub Actions Inactive

moved to the ternary operator

2960aa1

MarcusSorealheis temporarily deployed to production September 6, 2025 08:27 — with GitHub Actions Inactive

MarcusSorealheis temporarily deployed to production September 6, 2025 08:30 — with GitHub Actions Inactive

MarcusSorealheis commented Sep 6, 2025

View reviewed changes

nativelink-scheduler/src/memory_awaited_action_db.rs Show resolved Hide resolved

add docs wrap otel impl

a7eb2d9

MarcusSorealheis temporarily deployed to production September 6, 2025 13:41 — with GitHub Actions Inactive

adds comprehensive metrics documentation

c48cc4b

MarcusSorealheis temporarily deployed to production September 6, 2025 14:08 — with GitHub Actions Inactive

chrisstaite-menlo reviewed Sep 8, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implement metrics using otel #1885

Implement metrics using otel #1885

Uh oh!

MarcusSorealheis commented Jul 30, 2025 •

edited

Loading

Uh oh!

Uh oh!

chrisstaite-menlo commented Sep 5, 2025

Uh oh!

chrisstaite-menlo left a comment

Uh oh!

MarcusSorealheis left a comment

Uh oh!

Uh oh!

MarcusSorealheis commented Sep 6, 2025

Uh oh!

MarcusSorealheis commented Sep 6, 2025

Uh oh!

MarcusSorealheis commented Sep 6, 2025

Uh oh!

chrisstaite-menlo left a comment

Uh oh!

Uh oh!

Implement metrics using otel #1885

Are you sure you want to change the base?

Implement metrics using otel #1885

Uh oh!

Conversation

MarcusSorealheis commented Jul 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

How Has This Been Tested?

Checklist

Uh oh!

Uh oh!

chrisstaite-menlo commented Sep 5, 2025

Uh oh!

chrisstaite-menlo left a comment

Choose a reason for hiding this comment

Uh oh!

MarcusSorealheis left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

MarcusSorealheis commented Sep 6, 2025

Uh oh!

MarcusSorealheis commented Sep 6, 2025

Uh oh!

MarcusSorealheis commented Sep 6, 2025

Uh oh!

chrisstaite-menlo left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

MarcusSorealheis commented Jul 30, 2025 •

edited

Loading