Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NeMo2.0 llama3 perf scripts #11702

Open
wants to merge 12 commits into
base: main
Choose a base branch
from
Open

NeMo2.0 llama3 perf scripts #11702

wants to merge 12 commits into from

Conversation

malay-nagda
Copy link
Collaborator

@malay-nagda malay-nagda commented Dec 23, 2024

What does this PR do ?

llama3 pre-training recipes with performance optimizations

Collection: [llm]

Changelog

  • Add specific line by line info of high level changes in this PR.

Usage

python3 scripts/llm/performance/llama3_8b.py -a <slurm_account> -p <slurm_partition> -i nvcr.io/nvidia/nemo:24.09

GitHub Actions CI

The Jenkins CI system has been replaced by GitHub Actions self-hosted runners.

The GitHub Actions CI will run automatically when the "Run CICD" label is added to the PR.
To re-run CI remove and add the label again.
To run CI on an untrusted fork, a NeMo user with write access must first click "Approve and run".

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you add or update any necessary documentation?
  • Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
    • Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

  • New Feature
  • Bugfix
  • Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

  • Related to # (issue)

Signed-off-by: Malay Nagda <[email protected]>
@malay-nagda malay-nagda changed the title perf scripts llama3 8b NeMo2.0 perf scripts Dec 23, 2024
scripts/llm/performance/llama3_8b.py Fixed Show fixed Hide fixed
scripts/llm/performance/llama3_8b.py Fixed Show fixed Hide fixed
scripts/llm/performance/utils.py Fixed Show fixed Hide fixed
malay-nagda and others added 5 commits December 23, 2024 16:18
Signed-off-by: Malay Nagda <[email protected]>
Signed-off-by: Malay Nagda <[email protected]>
Signed-off-by: Malay Nagda <[email protected]>
@malay-nagda malay-nagda requested a review from erhoo82 December 23, 2024 15:07
malay-nagda and others added 2 commits December 23, 2024 21:02
Signed-off-by: Malay Nagda <[email protected]>
@malay-nagda malay-nagda marked this pull request as ready for review December 23, 2024 15:35
@malay-nagda malay-nagda changed the title NeMo2.0 perf scripts NeMo2.0 llama3 perf scripts Dec 23, 2024
@malay-nagda malay-nagda self-assigned this Dec 23, 2024
@erhoo82
Copy link
Collaborator

erhoo82 commented Dec 23, 2024

Maybe should set tensorboard logger disabled by default and document how to enable it?
tensorboard logger causes performance overhead.

name=exp_name,
plugins=[
PerfEnvPlugin(enable_vboost=True),
NsysPlugin(start_step=5, end_step=6),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add a knob to have the profile disabled by default.

Copy link
Contributor

[🤖]: Hi @malay-nagda 👋,

We wanted to let you know that a CICD pipeline for this PR just finished successfully

So it might be time to merge this PR or get some approvals

I'm just a bot so I'll leave it you what to do next.

//cc @pablo-garay @ko3n1g

@github-actions github-actions bot removed the NLP label Dec 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants