Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reads out FP16 parameters after quantization #617

Open
Jason202268 opened this issue Sep 19, 2024 · 0 comments
Open

Reads out FP16 parameters after quantization #617

Jason202268 opened this issue Sep 19, 2024 · 0 comments

Comments

@Jason202268
Copy link

After using scrips like

from awq import AutoAWQForCausalLM
from transformers import AutoTokenizer

# Specify paths and hyperparameters for quantization
quant_path = "./models/test"
quant_config = { "zero_point": True, "q_group_size": 128, "w_bit": 4, "version": "GEMM" }

# Load your tokenizer and model with AutoAWQ
model = AutoAWQForCausalLM.from_pretrained(model_path, device_map="auto", safetensors=True)
model.quantize(tokenizer, quant_config=quant_config, calib_data=formatted_data)
model.save_quantized(quant_path, safetensors=True, shard_size="4GB")
tokenizer.save_pretrained(quant_path)

I manually checked the parameters using safetensor library and the dtype shows FP16, is this normal?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant