forked from AI-Hypercomputer/maxtext
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
18 changed files
with
132 additions
and
16 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,31 @@ | ||
# What is Maxtext? | ||
|
||
MaxText is a Google initiated open source project for high performance, highly scalable, open-source LLM written in pure Python/[JAX](https://jax.readthedocs.io/en/latest/index.html) and targeting Google Cloud TPUs and GPUs for training and inference. | ||
|
||
MaxText achieves very high MFUs (Model Flop Utilization) and scales from single host to very large clusters while staying simple and "optimization-free". | ||
|
||
MaxText additionally provides an highly optimized reference implementations for popular Open Source models like: | ||
|
||
- Llama 2, 3 and 3.1 | ||
- Mistral and Mixtral | ||
- Gemma and Gemma2 | ||
- GPT | ||
|
||
These reference implementations support pre-training and full fine tuning. Maxtext also allows you to create various sized models for benchmarking purposes. | ||
|
||
The key value proposition of using MaxText for pre-training or full fine tuning is: | ||
|
||
- Very high performance of average of 50% MFU | ||
- Open code base - Code base can be found at the following github location. | ||
- Easy to understand: MaxText is purely written in JAX and Python, which makes it accessible to ML developers interested in inspecting the implementation or stepping through it. It is written at the block-by-block level, with code for Embeddings, Attention, Normalization etc. Different Attention mechanisms like MQA and GQA are all present. For quantization, it uses the JAX AQT library. The implementation is suitable for both GPUs and TPUs. | ||
|
||
MaxText aims to be a launching off point for ambitious LLM projects both in research and production. We encourage users to start by experimenting with MaxText out of the box and then fork and modify MaxText to meet their needs. | ||
|
||
!!! note | ||
|
||
Maxtext today only supports Pre-training and Full Fine Tuning of the models. It does not support PEFT/LoRA, Supervised Fine Tuning or RLHF | ||
|
||
## Who is the target user of Maxtext? | ||
|
||
- Any individual or a company that is interested in forking maxtext and seeing it as a reference implementation of a high performance Large Language Models and wants to build their own LLMs on TPU and GPU. | ||
- Any individual or a company that is interested in performing a pre-training or Full Fine Tuning of the supported open source models, can use Maxtext as a blackbox to perform full fine tuning. Maxtext attains an extremely high MFU, resulting in large savings in training costs. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
# Data Loading | ||
|
||
Maxtext supports input data pipelines in the following ways: | ||
Tf.data* | ||
Grain | ||
Hugging Face Datasets | ||
|
||
*Tf.data is the most performant way of loading large scale datasets. | ||
|
||
You can read more about the pipelines in [](getting_started/Data_Input_Pipeline.md). |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
# Steps to build a Model | ||
|
||
 | ||
_Fig1: Stages of LLM Model Development from pre-training to fine tuning and finally serving a model._ | ||
|
||
Model building starts with Pre-training a base model architecture. Pre-training is the process where you take a model architecture, which starts with random weights and train with a very large corpus in the scale of trillions of tokens. E.g. Google’s Gemma models were pre-trained on 6 Trillion tokens; LLama 3 was trained with 15 Trillion tokens | ||
|
||
Post the pre-training most model producers will publish a checkpoint of the weights of the model. The corpus used for pre-training these models are usually a large public corpus like Common Crawl, public code bases, books etc. | ||
|
||
Though these may be a great way to answer very general questions or prompts, they usually fail on very domain specific questions and answers like Medical and Life Sciences, Engineering, etc. | ||
|
||
Customers and enterprises usually like to continue training a pre-trained model or performing a full fine tuning of the models using their own datasets. These datasets are usually in billions of tokens. This allows better prompt understanding when questions are asked on keywords and terms specific to their model or domain specific question. | ||
|
||
Post a Full Fine Tuning, most models go through a process of Instruction Fine Tuning(PEFT/LoRA), Supervised Fine Tuning and RLHF to improve the model quality and follow prompt answers better. | ||
|
||
PEFT/Lora, Supervised Finetuning are less expensive operations compared to full fine tuning. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
{% include "../README.md" %} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
# MaxText Code Organization | ||
|
||
Maxtext is purely written in JAX and python. Below are some folders and files | ||
that show a high-level organization of the code and some key files. | ||
|
||
File/Folder | Description | ||
---------|--------------------------------- | ||
`configs` | Folder contains all the config file, including model configs (llama2, mistral etc) , and pre-optimized configs for different model size on different TPUs | ||
`input_pipelines` | Input training data related code | ||
`layers` | Model layer implementation | ||
`end_to_end` | Example scripts to run Maxtext | ||
`Maxtext/train.py` | The main training script you will run directly | ||
`Maxtext/config/base.yaml` | The base configuration file containing all the related info: checkpointing, model arch, sharding schema, data input, learning rate, profile, compilation, decode | ||
`Maxtext/decode.py` | This is a script to run offline inference with a sample prompt | ||
`setup.sh`| Bash script used to install all needed library dependencies. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
# Configuration options |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,36 @@ | ||
site_name: MaxText Documentation | ||
|
||
theme: | ||
name: material | ||
features: | ||
- navigation.tabs | ||
- navigation.expand | ||
|
||
docs_dir: docs | ||
|
||
plugins: | ||
- search | ||
- include-markdown | ||
|
||
markdown_extensions: | ||
- tables | ||
|
||
nav: | ||
- Home: index.md | ||
- about.md | ||
- Getting started: | ||
- getting_started/First_run.md | ||
- getting_started/steps_model.md | ||
- End-to-end example: https://www.kaggle.com/code/melissawm/maxtext-examples | ||
- Advanced usage: | ||
- getting_started/Run_MaxText_via_multihost_job.md | ||
- getting_started/Run_MaxText_via_multihost_runner.md | ||
- getting_started/Run_MaxText_via_xpk.md | ||
- getting_started/Use_Vertex_AI_Tensorboard.md | ||
- getting_started/Run_Llama2.md | ||
- data_loading.md | ||
- Reference: | ||
- reference/code_organization.md | ||
- reference/config_options.md | ||
- getting_started/Data_Input_Pipeline.md | ||
- getting_started/Data_Input_Perf.md |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
mkdocs-material | ||
mkdocs-include-markdown-plugin |