axolotl_memory

Small experiments on estimating transformer memory from axolotl config files.

This is primarily a learning project, in which I am looking to predict the memory consumption of a LLM from an axolotl config file alone. The hope is that if we can get a reasonable solution for this, we can move it over to the axolotl project directly. Issue Here

How to Use

Simply pass the axolotl config file to the main script as a '--config' file path to estimate the size in memory.

python main.py --config examples/code-llama/7b/lora.yml

Would return:

┌─────────────────────────────────────────────────────────────────┐
│                         Memory Estimate                         │
├──────────────────────────────────────────┬───────────┬──────────┤
│ Modelling                                │ Precision │  Memory  │
├──────────────────────────────────────────┼───────────┼──────────┤
│  Base Model (codellama/CodeLlama-7b-hf)  │   BIT8    │  6.2GiB  │
│  LORA Adapter                            │   BIT16   │ 152.5MiB │
├──────────────────────────────────────────┬───────────┬──────────┤
│ Training                                 │ Precision │  Memory  │
├──────────────────────────────────────────┼───────────┼──────────┤
│  Gradients                               │   BIT16   │ 152.5MiB │
│  Optimizer (adamw_bnb_8bit)              │   BIT8    │ 152.5MiB │
│  Activations                             │   MIXED   │ 201.8MiB │
└──────────────────────────────────────────┴───────────┴──────────┘

Functionality

Borrowing from here, we group memory requirements into three broad buckets:

1. Model Memory

The memory required to load the model into memory. Includes base model, quantized or unquantized, and peft adapters.

Model Base	Base Model	4bit	8bit	LORA	QLORA	GPTQ	GPTQ w/Flash Attn	flash attn	xformers attn
Llama	✔️	✔️	✔️	✔️	❌	❌	❌	❌	❌

2. Gradient & Optimizer Memory

The memory required for a single backward pass of the model.

Optimizer	Basic
sgd	✔️
adamw_hf	✔️
adamw_torch	✔️
adamw_torch_fused	✔️
adamw_apex_fused	❌
adamw_anyprecision	❌
adafactor	❌
adamw_bnb_8bit	✔️

3. Activation Memory

The required memory to do a forward pass of the model.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
examples		examples
src/axolotl_memory		src/axolotl_memory
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
main.py		main.py
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

axolotl_memory

How to Use

Functionality

1. Model Memory

2. Gradient & Optimizer Memory

3. Activation Memory

About

Releases

Packages

Languages

License

KCaverly/axolotl_memory

Folders and files

Latest commit

History

Repository files navigation

axolotl_memory

How to Use

Functionality

1. Model Memory

2. Gradient & Optimizer Memory

3. Activation Memory

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages