[QUESTION] #1308

eliird · 2024-12-02T00:30:30Z

Your question
Has anyone been able to get llama 3 70B working using 2x8 80GB GPUs? I have tried all possible settings but keep getting OOM error. If anyone has, then I would like to klnow the arguments and details.

It should take around 540GBs to train according to my calculations and I think I have more than enough.

Llama3 70B has 80 layers of transformer and the maximum size model I can fit is 70 layers, and sequence length of 16. I tested basically all possible arguments in the settings.

I saw one person's comment that suggested they got it working, but I checked their forked repo and the arguments describing the model were off. The settings had 24 layers of transformer blocks instead of 80.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[QUESTION] #1308

[QUESTION] #1308

eliird commented Dec 2, 2024

[QUESTION] #1308

[QUESTION] #1308

Comments

eliird commented Dec 2, 2024