How to pretrain Qwen 2.5 7B with 128K context length? #13731

tjoymeed · 2025-05-26T05:41:59Z

tjoymeed
May 26, 2025

Hi Team,

Thank you for your excellent work.

Could you please tell me how to pretrain Qwen 2.5 7B with 128K context length?

Where to find this recipe?

Thanks again!

tjoymeed · 2025-05-27T06:33:02Z

If I use the premade training script in NeMo, will that give me the 128K context length, or 4096 context length?

0 replies