[New Method]: VPTQ, Vector Post-Training Quantization #792

YangWang92 · 2024-10-29T13:59:03Z

The quantization format

Hi all,
We have recently designed and open-sourced a new method for Vector Quantization called Vector Post-Training Quantization (VPTQ). Our work is available at VPTQ GitHub repository with the algorithm detailed at this link and models.

VPTQ is a novel Post-Training Quantization method that leverages Vector Quantization to achieve high accuracy on Large Language Models (LLMs) at extremely low bit-widths (<2-bit). Impressively, VPTQ can compress models up to 70B, and even 405B, to 1-2 bits without retraining while maintaining high accuracy.

I am very interested in integrating VPTQ into the aphrodite-engine and other inference frameworks.

Thanks!
Yang

What are its advantages over the existing quantization methods?

VPTQ is a novel Post-Training Quantization method that leverages Vector Quantization to achieve high accuracy on Large Language Models (LLMs) at extremely low bit-widths (<2-bit). Impressively, VPTQ can compress models up to 70B, and even 405B, to 1-2 bits without retraining while maintaining high accuracy. And VPTQ has a lower dequantization (decode) overhead compared with other methods.

AlpinDale · 2024-10-30T05:29:08Z

Hello! We've already implemented QuIP# (albeit it's been stale for a while due to the lack of use). How similar is to QuIP#? From what I understand, VPTQ is based off of it, right?

YangWang92 · 2024-10-30T06:11:13Z

Both VPTQ and QuIP# are based on vector quantization, but the design of VPTQ is simpler; it does not require complex Hadamard transformations and only needs a lookup table to achieve lower bit model quantization.

YangWang92 changed the title ~~[New Method]:~~ [New Method]: VPTQ, Vector Post-Training Quantization Oct 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[New Method]: VPTQ, Vector Post-Training Quantization #792

[New Method]: VPTQ, Vector Post-Training Quantization #792

YangWang92 commented Oct 29, 2024 •

edited

Loading

AlpinDale commented Oct 30, 2024

YangWang92 commented Oct 30, 2024

[New Method]: VPTQ, Vector Post-Training Quantization #792

[New Method]: VPTQ, Vector Post-Training Quantization #792

Comments

YangWang92 commented Oct 29, 2024 • edited Loading

The quantization format

What are its advantages over the existing quantization methods?

AlpinDale commented Oct 30, 2024

YangWang92 commented Oct 30, 2024

YangWang92 commented Oct 29, 2024 •

edited

Loading