How is Llava quantized ? #621

Abhranta · 2024-09-22T12:27:12Z

In autoawq, do we only quantize the LLM part of Llava or do we also quantize the ViT ? Can we add support for quantizing the vision models like ViT or SIGLIP?

sailfish009 · 2024-10-14T10:23:37Z

@Abhranta Hi, there is AutoGPTQ :

from transformers import AutoTokenizer, TextGenerationPipeline
from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig
import logging

logging.basicConfig(
    format="%(asctime)s %(levelname)s [%(name)s] %(message)s", level=logging.INFO, datefmt="%Y-%m-%d %H:%M:%S"
)

"""
Download https://huggingface.co/liuhaotian/llava-llama-2-13b-chat-lightning-preview to local
Make following edits to the config.json
LlavaLlamaForCausalLM -> LlamaForCausalLM
"model_type": "llava" -> "llama"
"""
pretrained_model_dir = "./checkpoints/llava-llama-2-13b-chat-lightning-preview"

quantized_model_dir = "llava-llama-2-13b-chat-lightning-4bit-128g"

tokenizer = AutoTokenizer.from_pretrained(pretrained_model_dir, use_fast=True)
examples = [
    tokenizer(
        "auto-gptq is an easy-to-use model quantization library with user-friendly apis, based on GPTQ algorithm."
    )
]

quantize_config = BaseQuantizeConfig(
    bits=4,  # quantize model to 4-bit
    group_size=128,  # it is recommended to set the value to 128
    desc_act=False,  # set to False can significantly speed up inference but the perplexity may slightly bad 
)

# load un-quantized model, by default, the model will always be loaded into CPU memory
model = AutoGPTQForCausalLM.from_pretrained(pretrained_model_dir, quantize_config)

# quantize model, the examples should be list of dict whose keys can only be "input_ids" and "attention_mask"
model.quantize(examples)

# save quantized model using safetensors
model.save_quantized(quantized_model_dir, use_safetensors=True)

Abhranta · 2024-10-14T15:35:06Z

Does this quantize only the LLM or the ViT too ?

pratyush0599 · 2024-10-16T20:27:02Z

Hi @sailfish009, is there no native support for LLava based models. The solution you suggested seems very hacky:( I was also wondering if the quantization happens to the vision encoder too?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How is Llava quantized ? #621

How is Llava quantized ? #621

Abhranta commented Sep 22, 2024

sailfish009 commented Oct 14, 2024

Abhranta commented Oct 14, 2024

pratyush0599 commented Oct 16, 2024

How is Llava quantized ? #621

How is Llava quantized ? #621

Comments

Abhranta commented Sep 22, 2024

sailfish009 commented Oct 14, 2024

Abhranta commented Oct 14, 2024

pratyush0599 commented Oct 16, 2024