Best practice for training LLaMA models in Megatron-LM
-
Updated
Jan 2, 2024 - Python
Best practice for training LLaMA models in Megatron-LM
Annotations of the interesting ML papers I read
Large scale 4D parallelism pre-training for 🤗 transformers in Mixture of Experts *(still work in progress)*
A LLaMA1/LLaMA12 Megatron implement.
Megatron-LM/GPT-NeoX compatible Text Encoder with 🤗Transformers AutoTokenizer.
Training NVIDIA NeMo Megatron Large Language Model (LLM) using NeMo Framework on Google Kubernetes Engine
Add a description, image, and links to the megatron-lm topic page so that developers can more easily learn about it.
To associate your repository with the megatron-lm topic, visit your repo's landing page and select "manage topics."