You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Your question
Has anyone been able to get llama 3 70B working using 2x8 80GB GPUs? I have tried all possible settings but keep getting OOM error. If anyone has, then I would like to klnow the arguments and details.
It should take around 540GBs to train according to my calculations and I think I have more than enough.
Llama3 70B has 80 layers of transformer and the maximum size model I can fit is 70 layers, and sequence length of 16. I tested basically all possible arguments in the settings.
I saw one person's comment that suggested they got it working, but I checked their forked repo and the arguments describing the model were off. The settings had 24 layers of transformer blocks instead of 80.
The text was updated successfully, but these errors were encountered:
Your question
Has anyone been able to get llama 3 70B working using 2x8 80GB GPUs? I have tried all possible settings but keep getting OOM error. If anyone has, then I would like to klnow the arguments and details.
It should take around 540GBs to train according to my calculations and I think I have more than enough.
Llama3 70B has 80 layers of transformer and the maximum size model I can fit is 70 layers, and sequence length of 16. I tested basically all possible arguments in the settings.
I saw one person's comment that suggested they got it working, but I checked their forked repo and the arguments describing the model were off. The settings had 24 layers of transformer blocks instead of 80.
The text was updated successfully, but these errors were encountered: