Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support the parallel conversion from ZeRO checkpoints to FP32/FP16/BF16 param weight #6655

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

xylian86
Copy link
Contributor

@xylian86 xylian86 commented Oct 23, 2024

This PR adds parallel conversion support for the zero_to_fp32 script using universal checkpoints, addressing the request at ISSUE#6526. "The previous version was written 3 years ago models were small and converted fast. Now with 70B+ models the conversion can take hours."

  • Switch to universal checkpoint API
  • Support Frozen Parameters
  • Support Shared Parameters
  • Add the support for output to SateTensor
  • Add the support for output to FP16/BF16

@xylian86 xylian86 marked this pull request as ready for review October 23, 2024 15:22
@loadams
Copy link
Contributor

loadams commented Oct 25, 2024

Hi @xylian86 thanks for the contribution, could you run the pre-commit formatter locally to fix the Formatting error?

@loadams loadams requested review from tohtana and removed request for awan-10 October 25, 2024 01:57
@xylian86
Copy link
Contributor Author

@loadams Thank you for the reminder. I run the pre-commit formatter locally before the PR but It seems that formatting outputs can vary across different environments. After switching to a new machine with ubuntu-22.04 and re-running pre-commit formatter, I've resolved the formatting issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants