Skip to content

create baselines #291

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: main
Choose a base branch
from
Open

create baselines #291

wants to merge 8 commits into from

Conversation

PaliC
Copy link
Collaborator

@PaliC PaliC commented Jun 4, 2025

This PR adds milestones to the leaderboard. These are defined in the task.yml of a problem. An example of this can be found in examples/matmul/task.yml.

The idea of these milestones are

  1. Benchmarks in which folks want to compete against.
  2. A reference point in which we can easily pull stats out. For example how many submissions / what was the average speed up against the AMD reference.
  3. We don't have to do some hacky thing when referencing these milestones (our own submission)

This PR primarily makes it such that we trigger runs on leaderboard creation to add milestones to the leaderboard when it is created. It also adds some admin cogs to rerun and view milestones manually (for testing purposes). Below are some screenshots showing things work. (You can also create a leaderboard and test things yourself)

Screenshot 2025-06-13 at 1 39 12 PM Screenshot 2025-06-13 at 1 39 58 PM

copilot summary

This pull request introduces major updates to the milestone submission system and leaderboard functionality, along with additional enhancements to task files and error handling. The changes aim to streamline milestone management, improve code modularity, and enhance user experience in Discord-based workflows.

Milestone Management and Submission Enhancements:

  • Auto-submission of milestones for new leaderboards: Added logic to automatically submit milestones when a leaderboard is created, including database synchronization and GPU-based task execution. (src/discord-cluster-manager/cogs/admin_cog.py, src/discord-cluster-manager/cogs/admin_cog.pyR384-R527)
  • New Discord commands for milestone management: Introduced commands such as submit-milestones, list-milestones, and milestone-results to manage and view milestone submissions directly from Discord. (src/discord-cluster-manager/cogs/admin_cog.py, src/discord-cluster-manager/cogs/admin_cog.pyR1185-R1368)

Task File Updates:

  • Added PyTorch and torch.mm reference implementations: Created new files pytorch_ref.py and torch_mm_ref.py to provide baseline performance implementations for matrix multiplication tasks. (examples/matmul_py/pytorch_ref.py, [1]; examples/matmul_py/torch_mm_ref.py, [2]
  • Updated task.yml to include milestones: Defined milestones for the new reference implementations, enabling automated performance benchmarking. (examples/matmul_py/task.yml, examples/matmul_py/task.ymlR9-R22)

Leaderboard Functionality Improvements:

  • Support for milestone mode in leaderboard submissions: Enhanced the submission logic to handle milestone mode, allowing submissions without requiring user-uploaded scripts. (src/discord-cluster-manager/cogs/leaderboard_cog.py, src/discord-cluster-manager/cogs/leaderboard_cog.pyL66-R101)
  • Improved error handling for leaderboard submissions: Updated error messages to provide clearer context when a leaderboard is not found during submission. (src/discord-cluster-manager/api/main.py, src/discord-cluster-manager/api/main.pyL359-R363)

General Improvements:

  • Enhanced dependency management in workflows: Fixed duplicate jq installation commands in .github/workflows/nvidia_workflow.yml. (.github/workflows/nvidia_workflow.yml, .github/workflows/nvidia_workflow.ymlR31-R34)
  • Minor refactor and import updates: Added missing imports and cleaned up unused code in various files. (src/discord-cluster-manager/cogs/admin_cog.py, [1]; src/discord-cluster-manager/cogs/leaderboard_cog.py, [2]

These updates collectively enhance the usability and functionality of the milestone and leaderboard systems, making it easier for users to benchmark and manage submissions efficiently.

@PaliC PaliC marked this pull request as ready for review June 13, 2025 17:58
@Copilot Copilot AI review requested due to automatic review settings June 13, 2025 17:58
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Adds a new “milestone” feature that lets tasks define baselines, tracks their runs in the database, and exposes CLI/API/Discord commands for submitting and reporting milestone runs.

  • Introduce milestones field on tasks and include it in generated configs
  • Extend the DB schema and data-access code for milestones and milestone runs
  • Update submission logic, Discord cogs, API handlers, and example tasks/workflows to support the new mode

Reviewed Changes

Copilot reviewed 19 out of 19 changed files in this pull request and generated no comments.

Show a summary per file
File Description
utils.py Include milestones in task‐config payload
task.py Add milestones field to LeaderboardTask
submission.py Extend prepare_submission signature to take a mode
run_eval.py Treat "milestone" as a valid mode in eval runner
report.py Add short‐report entries for milestone results
migrations/…add-milestone-table.py Create milestones and milestone_runs tables
leaderboard_db.py CRUD methods for milestones and runs + cleanup in delete
consts.py New SubmissionMode.MILESTONE, system‐user helpers
cogs/submit_cog.py Handle milestone submissions end-to-end in Discord cog
cogs/leaderboard_cog.py Import and use lookup, added system user for milestones
cogs/admin_cog.py Commands for listing and submitting milestones, auto-submit
api/utils.py Pass mode into prepare_submission
api/main.py Enrich 404 message when running a missing leaderboard
examples/matmul_py/* Add two reference implementations and milestone definitions
examples/eval.py Treat "milestone" identically to "leaderboard"
.github/workflows/nvidia_workflow.yml Install jq twice (duplicate)
Comments suppressed due to low confidence (5)

src/discord-cluster-manager/cogs/submit_cog.py:180

  • The variable name is not defined in this scope. Did you mean to use filename (the submission filename) instead?
milestone = next((m for m in milestones if m["filename"] == name),

src/discord-cluster-manager/cogs/submit_cog.py:176

  • You're passing the tuple returned by get_system_user_name(None) into the user_name column. Unpack it so you pass the ID and the name string separately.
(str(SYSTEM_USER_ID), get_system_user_name(None)),

.github/workflows/nvidia_workflow.yml:33

  • [nitpick] You install jq twice (lines 33 and 36) using both apt and apt-get. Consolidate to a single install step.
apt update && apt install -y jq

src/discord-cluster-manager/leaderboard_db.py:243

  • New methods for milestone CRUD (e.g. create_milestone, get_leaderboard_milestones, record_milestone_run, get_milestone_runs) should have accompanying unit tests to ensure correct behavior.
def create_milestone(

examples/matmul_py/reference.py:23

  • The signature of check_implementation originally returned a single string or empty string; changing it to a tuple may break callers expecting a string. Make sure the consumer is updated to handle (bool, str).
return False, "mismatch found! custom implementation doesn't match reference.: " + reasons[0]

@PaliC PaliC requested review from msaroufim, ngc92 and S1ro1 June 13, 2025 18:04
@@ -6,6 +6,20 @@ files:
- {"name": "utils.py", "source": "../utils.py"}
- {"name": "reference.py", "source": "reference.py"}
- {"name": "eval.py", "source": "../eval.py"}
- {"name": "pytorch_ref.py", "source": "pytorch_ref.py"}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Milestones shouldn't be added here. These files get sent to the runner for every submission.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ohh is there another place to config problems or not really?

@@ -366,7 +381,149 @@ async def create_leaderboard_in_db(
ephemeral=True,
)
return False
return True

# Check if the task has milestones and automatically submit them
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be a separate function. create_leaderboard_in_db should do what the name says, having it spawn stuff on runners would be rather surprising.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants