Skip to content

Need a concise pipeline for best practices in implementing large model loading from pretrained and resume training under FSDP/TP scenarios? #1038

Answered by wwwjn
tangjiasheng asked this question in Q&A
Discussion options

You must be logged in to vote

Thanks for bring this up! Discussed with @tianyu-l yesterday, we plan to make a simple tutorial (on top of a torchtitan fork) to let user focus more on DTensor-based parallelism. Instead of trim current torchtitan, we aim to provide examples of building blocks starting from scratch. Eg, Step1: starting with a model running on single GPU -> Step2: Adding FSDP on top of it -> Step3: Adding more parallelism on top of step2 -> Step4: Adding other features like meta-device initialization -> Step 5: Adding more and more features.

Would you think this would be helpful to the community? Thanks again for your feedback!

Replies: 6 comments 16 replies

Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
1 reply
@tangjiasheng
Comment options

Comment options

You must be logged in to vote
7 replies
@tangjiasheng
Comment options

@tianyu-l
Comment options

tianyu-l Apr 8, 2025
Collaborator

@tangjiasheng
Comment options

@wwwjn
Comment options

wwwjn Apr 9, 2025
Collaborator

Answer selected by tangjiasheng
@tangjiasheng
Comment options

@tianyu-l
Comment options

@tangjiasheng
Comment options

Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
8 replies
@tangjiasheng
Comment options

@tangjiasheng
Comment options

@fegin
Comment options

fegin May 8, 2025
Collaborator

@fegin
Comment options

fegin May 8, 2025
Collaborator

@tangjiasheng
Comment options

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
4 participants