Skip to content

Conversation

MarcusSorealheis
Copy link
Collaborator

@MarcusSorealheis MarcusSorealheis commented Jul 30, 2025

Description

This PR is a continuation of this cache metrics effort, consolidating cache and execution into one PR: #1804

It still may not be adopted.

Fixes # (issue)

Type of change

Please delete options that aren't relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to
    not work as expected)
  • This change requires a documentation update

How Has This Been Tested?

Please also list any relevant details for your test configuration

Checklist

  • Updated documentation if needed
  • Tests added/amended
  • bazel test //... passes locally
  • PR is contained in a single commit, using git amend see some docs

This change is Reviewable

@MarcusSorealheis MarcusSorealheis marked this pull request as draft August 1, 2025 18:17
@MarcusSorealheis MarcusSorealheis marked this pull request as ready for review August 14, 2025 15:53
@chrisstaite-menlo
Copy link
Collaborator

nativelink-scheduler/src/memory_awaited_action_db.rs line 649 at r1 (raw file):

                // Update active count for old stage
                let old_stage_attrs = match old_stage {

I feel like it would be much nicer to have something like:

let old_stage_attrs = vec![opentelemetry::KeyValue::new(
                        nativelink_util::metrics::EXECUTION_STAGE,
                        old_stage.into(),
                    )]

And then impl From<ActionStage> for ExecutionStage if it's not already been done.

Copy link
Collaborator

@chrisstaite-menlo chrisstaite-menlo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 1 LGTMs obtained, and 0 of 5 files reviewed, and 3 discussions need to be resolved


nativelink-scheduler/src/memory_awaited_action_db.rs line 704 at r1 (raw file):

                // Record completion metrics
                if let ActionStage::Completed(action_result) = new_stage {
                    let result_attrs = if action_result.exit_code == 0 {
let result_attrs = vec![opentelemetry::KeyValue::new(
                            nativelink_util::metrics::EXECUTION_RESULT,
                            if action_result.exit_code == 0 { ExecutionResult::Success} else { ExecutionResult::Failure },
                        )]

Copy link
Collaborator Author

@MarcusSorealheis MarcusSorealheis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 1 LGTMs obtained, and 0 of 5 files reviewed, and 3 discussions need to be resolved


nativelink-scheduler/src/memory_awaited_action_db.rs line 649 at r1 (raw file):

Previously, chrisstaite-menlo (Chris Staite) wrote…

I feel like it would be much nicer to have something like:

let old_stage_attrs = vec![opentelemetry::KeyValue::new(
                        nativelink_util::metrics::EXECUTION_STAGE,
                        old_stage.into(),
                    )]

And then impl From<ActionStage> for ExecutionStage if it's not already been done.

I


nativelink-scheduler/src/memory_awaited_action_db.rs line 704 at r1 (raw file):

Previously, chrisstaite-menlo (Chris Staite) wrote…
let result_attrs = vec![opentelemetry::KeyValue::new(
                            nativelink_util::metrics::EXECUTION_RESULT,
                            if action_result.exit_code == 0 { ExecutionResult::Success} else { ExecutionResult::Failure },
                        )]

I've also finally fixed this one. Except, the CompletedFromCache case remains separate since it uses a different condition (checking the ActionStage variant rather than exit code) and maps to ExecutionResult::CacheHit.

@MarcusSorealheis
Copy link
Collaborator Author

I'm working on the docs. Should be done sometime tonight.

@MarcusSorealheis
Copy link
Collaborator Author

or tomorrow

@MarcusSorealheis
Copy link
Collaborator Author

To check the docs you can run:

cd web/platform
bun setup
bun docs
rm -r dist && bun run build
bun preview

It's a lot of content to review but should help people get started. It's only a basic setup. There's a lot more you can do with the dashboard configs, metrics server/backend (disallowed by our linter) configs, etc.

Copy link
Collaborator

@chrisstaite-menlo chrisstaite-menlo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chrisstaite-menlo reviewed 4 of 5 files at r1, 1 of 2 files at r4, 1 of 1 files at r6.
Reviewable status: 0 of 1 LGTMs obtained, and 3 of 17 files reviewed, and 1 discussions need to be resolved

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants