Unexpected ppl diff #116

YihengBrianWu · 2024-05-23T08:06:41Z

I'm now trying to quantize llama2-7b under w4a16g128 setting.
The script is
python3 main.py \ --model_name /mnt/bn/wyh-train/4bit/models/llama2-7b/model \ --device 0 \ --group_size 128 \ --bits 4 \ --iters 1000 \ --deployment_device 'fake,cpu,gpu' \ --output_dir "/mnt/bn/wyh-train/4bit/models/llama2-7b-auto-round"

The result is
wikitext2 c4
llama2-7b-fp16 5.4721 6.9727
llama2-7b-w4a16g128(auto_round) 10.4401 7.4204

Any Insight here?

The text was updated successfully, but these errors were encountered:

wenhuach21 · 2024-05-23T08:47:52Z

This issue is documented in our paper (https://arxiv.org/pdf/2309.05516v3) in Table 14, with a detailed explanation in Section 4.1. We hypothesize that the perplexity is highly sensitive to outliers. However, our limited tests did not show a significant impact in real deployment. To avoid this issue, setting the minmax lr to 2.0/iterations could be a solution based on my experiments for this model.

wenhuach21 · 2024-05-23T09:05:43Z

Besides, if your gpu memory is enough, you could set--disable_gpu_memory_usage, typically 1.5x-2x speedup based on my experiments.

YihengBrianWu · 2024-05-23T09:35:24Z

Besides, if your gpu memory is enough, you could set--disable_gpu_memory_usage, typically 1.5x-2x speedup based on my experiments.

Cool! Thanks for your help!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unexpected ppl diff #116

Unexpected ppl diff #116

YihengBrianWu commented May 23, 2024

wenhuach21 commented May 23, 2024

wenhuach21 commented May 23, 2024 •

edited

YihengBrianWu commented May 23, 2024

Unexpected ppl diff #116

Unexpected ppl diff #116

Comments

YihengBrianWu commented May 23, 2024

wenhuach21 commented May 23, 2024

wenhuach21 commented May 23, 2024 • edited

YihengBrianWu commented May 23, 2024

wenhuach21 commented May 23, 2024 •

edited