Fix Llama-3.2-Vision #163

Blaizzy · 2024-12-30T00:55:40Z

This PR introduces language only support significant improvements in generation speed and memory usage for Llama-3.2-11B-Vision-Instruct-4bit on MLX-VLM. Below are the key metrics comparing before and after performance, along with their percentage changes:

Metric	Before	After	% Diff
Prompt Tokens/sec	2.807	2.825	+0.64%
Generation Tokens/sec	0.362	6.692	+1749%
Peak Memory (GB)	65.453	16.252	-75.2%

Key Improvements

Generation Speed: We achieved nearly an 18x increase in tokens generated per second.
Memory Efficiency: Reduced peak memory usage by approximately 75%.
Slightly Faster Prompt Handling: Prompt ingestion speed improved by about 0.64%.

These optimizations should enable more efficient inference and more stable performance overall.

Please review the changes for correctness and let me know if you have any questions or concerns!

llama-3.2-vision performance improvement

Language Only vs VLM

Closes #100

* remove unused * add default layer_norm * remove unused * remove llava_bunny and idefics2 custom configs * refactor molmo and qwen2 config * add deprecation warning * refactor update model configs * refactor sanitize weights * refactor class_predicate * move custom config logic to from_dict * uncomment * fix config name * rename aligner to projector * fix tests * remove module from update list * add trusted remote as kwargs * update baseImageProcessor * refactor image processor * pin latest transformers * bump version * refactor prepare inputs * simplifiy image loading * fix load_image and refactor load_config * make skip_non_divisible a default * skip non divisible default and rename model inputs * refactor condition * fix language input only * add fetch KV * Increase default max tokens to 256 * refactor generate, generate step and stream * fix high usage and add language only support (#163)

fix high usage and add language only support

3441a91

Blaizzy changed the base branch from main to pc/refactor-utils-1 December 30, 2024 01:05

Blaizzy merged commit 8740c0a into pc/refactor-utils-1 Dec 30, 2024
1 check passed

Blaizzy deleted the pc/fix-mllama branch December 30, 2024 01:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix Llama-3.2-Vision #163

Fix Llama-3.2-Vision #163

Blaizzy commented Dec 30, 2024 •

edited

Loading

Fix Llama-3.2-Vision #163

Fix Llama-3.2-Vision #163

Conversation

Blaizzy commented Dec 30, 2024 • edited Loading

Key Improvements

Language Only vs VLM

Blaizzy commented Dec 30, 2024 •

edited

Loading