Skip to content

Experiments on estimating transformer memory from axolotl config

License

Notifications You must be signed in to change notification settings

KCaverly/axolotl_memory

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

axolotl_memory

Small experiments on estimating transformer memory from axolotl config files.

This is primarily a learning project, in which I am looking to predict the memory consumption of a LLM from an axolotl config file alone. The hope is that if we can get a reasonable solution for this, we can move it over to the axolotl project directly. Issue Here

How to Use

Simply pass the axolotl config file to the main script as a '--config' file path to estimate the size in memory.

python main.py --config examples/code-llama/7b/lora.yml

Would return:

┌─────────────────────────────────────────────────────────────────┐
│                         Memory Estimate                         │
├──────────────────────────────────────────┬───────────┬──────────┤
│ Modelling                                │ Precision │  Memory  │
├──────────────────────────────────────────┼───────────┼──────────┤
│  Base Model (codellama/CodeLlama-7b-hf)  │   BIT8    │  6.2GiB  │
│  LORA Adapter                            │   BIT16   │ 152.5MiB │
├──────────────────────────────────────────┬───────────┬──────────┤
│ Training                                 │ Precision │  Memory  │
├──────────────────────────────────────────┼───────────┼──────────┤
│  Gradients                               │   BIT16   │ 152.5MiB │
│  Optimizer (adamw_bnb_8bit)              │   BIT8    │ 152.5MiB │
│  Activations                             │   MIXED   │ 201.8MiB │
└──────────────────────────────────────────┴───────────┴──────────┘

Functionality

Borrowing from here, we group memory requirements into three broad buckets:

1. Model Memory

The memory required to load the model into memory. Includes base model, quantized or unquantized, and peft adapters.

Model Base Base Model 4bit 8bit LORA QLORA GPTQ GPTQ w/Flash Attn flash attn xformers attn
Llama ✔️ ✔️ ✔️ ✔️

2. Gradient & Optimizer Memory

The memory required for a single backward pass of the model.

Optimizer Basic
sgd ✔️
adamw_hf ✔️
adamw_torch ✔️
adamw_torch_fused ✔️
adamw_apex_fused
adamw_anyprecision
adafactor
adamw_bnb_8bit ✔️

3. Activation Memory

The required memory to do a forward pass of the model.

About

Experiments on estimating transformer memory from axolotl config

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages