create baselines #291

PaliC · 2025-06-04T18:50:47Z

This PR adds milestones to the leaderboard. These are defined in the task.yml of a problem. An example of this can be found in examples/matmul/task.yml.

The idea of these milestones are

Benchmarks in which folks want to compete against.
A reference point in which we can easily pull stats out. For example how many submissions / what was the average speed up against the AMD reference.
We don't have to do some hacky thing when referencing these milestones (our own submission)

This PR primarily makes it such that we trigger runs on leaderboard creation to add milestones to the leaderboard when it is created. It also adds some admin cogs to rerun and view milestones manually (for testing purposes). Below are some screenshots showing things work. (You can also create a leaderboard and test things yourself)

copilot summary

This pull request introduces major updates to the milestone submission system and leaderboard functionality, along with additional enhancements to task files and error handling. The changes aim to streamline milestone management, improve code modularity, and enhance user experience in Discord-based workflows.

Milestone Management and Submission Enhancements:

Auto-submission of milestones for new leaderboards: Added logic to automatically submit milestones when a leaderboard is created, including database synchronization and GPU-based task execution. (src/discord-cluster-manager/cogs/admin_cog.py, src/discord-cluster-manager/cogs/admin_cog.pyR384-R527)
New Discord commands for milestone management: Introduced commands such as submit-milestones, list-milestones, and milestone-results to manage and view milestone submissions directly from Discord. (src/discord-cluster-manager/cogs/admin_cog.py, src/discord-cluster-manager/cogs/admin_cog.pyR1185-R1368)

Task File Updates:

Added PyTorch and torch.mm reference implementations: Created new files pytorch_ref.py and torch_mm_ref.py to provide baseline performance implementations for matrix multiplication tasks. (examples/matmul_py/pytorch_ref.py, [1]; examples/matmul_py/torch_mm_ref.py, [2]
Updated task.yml to include milestones: Defined milestones for the new reference implementations, enabling automated performance benchmarking. (examples/matmul_py/task.yml, examples/matmul_py/task.ymlR9-R22)

Leaderboard Functionality Improvements:

Support for milestone mode in leaderboard submissions: Enhanced the submission logic to handle milestone mode, allowing submissions without requiring user-uploaded scripts. (src/discord-cluster-manager/cogs/leaderboard_cog.py, src/discord-cluster-manager/cogs/leaderboard_cog.pyL66-R101)
Improved error handling for leaderboard submissions: Updated error messages to provide clearer context when a leaderboard is not found during submission. (src/discord-cluster-manager/api/main.py, src/discord-cluster-manager/api/main.pyL359-R363)

General Improvements:

Enhanced dependency management in workflows: Fixed duplicate jq installation commands in .github/workflows/nvidia_workflow.yml. (.github/workflows/nvidia_workflow.yml, .github/workflows/nvidia_workflow.ymlR31-R34)
Minor refactor and import updates: Added missing imports and cleaned up unused code in various files. (src/discord-cluster-manager/cogs/admin_cog.py, [1]; src/discord-cluster-manager/cogs/leaderboard_cog.py, [2]

These updates collectively enhance the usability and functionality of the milestone and leaderboard systems, making it easier for users to benchmark and manage submissions efficiently.

Copilot

Pull Request Overview

Adds a new “milestone” feature that lets tasks define baselines, tracks their runs in the database, and exposes CLI/API/Discord commands for submitting and reporting milestone runs.

Introduce milestones field on tasks and include it in generated configs
Extend the DB schema and data-access code for milestones and milestone runs
Update submission logic, Discord cogs, API handlers, and example tasks/workflows to support the new mode

Reviewed Changes

Copilot reviewed 19 out of 19 changed files in this pull request and generated no comments.

Show a summary per file

File	Description
utils.py	Include `milestones` in task‐config payload
task.py	Add `milestones` field to `LeaderboardTask`
submission.py	Extend `prepare_submission` signature to take a `mode`
run_eval.py	Treat `"milestone"` as a valid mode in eval runner
report.py	Add short‐report entries for milestone results
migrations/…add-milestone-table.py	Create `milestones` and `milestone_runs` tables
leaderboard_db.py	CRUD methods for milestones and runs + cleanup in delete
consts.py	New `SubmissionMode.MILESTONE`, system‐user helpers
cogs/submit_cog.py	Handle milestone submissions end-to-end in Discord cog
cogs/leaderboard_cog.py	Import and use lookup, added system user for milestones
cogs/admin_cog.py	Commands for listing and submitting milestones, auto-submit
api/utils.py	Pass `mode` into `prepare_submission`
api/main.py	Enrich 404 message when running a missing leaderboard
examples/matmul_py/*	Add two reference implementations and milestone definitions
examples/eval.py	Treat `"milestone"` identically to `"leaderboard"`
.github/workflows/nvidia_workflow.yml	Install `jq` twice (duplicate)

Comments suppressed due to low confidence (5)

src/discord-cluster-manager/cogs/submit_cog.py:180

The variable name is not defined in this scope. Did you mean to use filename (the submission filename) instead?

milestone = next((m for m in milestones if m["filename"] == name),

src/discord-cluster-manager/cogs/submit_cog.py:176

You're passing the tuple returned by get_system_user_name(None) into the user_name column. Unpack it so you pass the ID and the name string separately.

(str(SYSTEM_USER_ID), get_system_user_name(None)),

.github/workflows/nvidia_workflow.yml:33

[nitpick] You install jq twice (lines 33 and 36) using both apt and apt-get. Consolidate to a single install step.

apt update && apt install -y jq

src/discord-cluster-manager/leaderboard_db.py:243

New methods for milestone CRUD (e.g. create_milestone, get_leaderboard_milestones, record_milestone_run, get_milestone_runs) should have accompanying unit tests to ensure correct behavior.

def create_milestone(

examples/matmul_py/reference.py:23

The signature of check_implementation originally returned a single string or empty string; changing it to a tuple may break callers expecting a string. Make sure the consumer is updated to handle (bool, str).

return False, "mismatch found! custom implementation doesn't match reference.: " + reasons[0]

ngc92 · 2025-06-15T13:41:00Z

examples/matmul_py/task.yml

@@ -6,6 +6,20 @@ files:
  - {"name": "utils.py", "source": "../utils.py"}
  - {"name": "reference.py", "source": "reference.py"}
  - {"name": "eval.py", "source": "../eval.py"}
+  - {"name": "pytorch_ref.py", "source": "pytorch_ref.py"}


Milestones shouldn't be added here. These files get sent to the runner for every submission.

ohh is there another place to config problems or not really?

ngc92 · 2025-06-15T14:03:31Z

src/discord-cluster-manager/cogs/admin_cog.py

@@ -366,7 +381,149 @@ async def create_leaderboard_in_db(
                    ephemeral=True,
                )
                return False
-            return True
+
+        # Check if the task has milestones and automatically submit them


This should be a separate function. create_leaderboard_in_db should do what the name says, having it spawn stuff on runners would be rather surprising.

PaliC and others added 8 commits June 4, 2025 14:50

create baselines

96627de

create baselines

e8e0351

push to test workflow

9ffc5b6

save work

a3fe891

Merge branch 'main' into palic/milestones

f5503f2

cleanup

6cde161

remove milestone deletion

341cbc9

lint

33c9f82

PaliC marked this pull request as ready for review June 13, 2025 17:58

Copilot AI review requested due to automatic review settings June 13, 2025 17:58

Copilot AI reviewed Jun 13, 2025

View reviewed changes

PaliC requested review from msaroufim, ngc92 and S1ro1 June 13, 2025 18:04

ngc92 reviewed Jun 15, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

create baselines #291

create baselines #291

Uh oh!

PaliC commented Jun 4, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

ngc92 Jun 15, 2025

Uh oh!

PaliC Jun 16, 2025

Uh oh!

ngc92 Jun 15, 2025

Uh oh!

Uh oh!

create baselines #291

Are you sure you want to change the base?

create baselines #291

Uh oh!

Conversation

PaliC commented Jun 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

copilot summary

Milestone Management and Submission Enhancements:

Task File Updates:

Leaderboard Functionality Improvements:

General Improvements:

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

ngc92 Jun 15, 2025

Choose a reason for hiding this comment

Uh oh!

PaliC Jun 16, 2025

Choose a reason for hiding this comment

Uh oh!

ngc92 Jun 15, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

PaliC commented Jun 4, 2025 •

edited

Loading