Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[New Method]: VPTQ, Vector Post-Training Quantization #792

Open
YangWang92 opened this issue Oct 29, 2024 · 2 comments
Open

[New Method]: VPTQ, Vector Post-Training Quantization #792

YangWang92 opened this issue Oct 29, 2024 · 2 comments

Comments

@YangWang92
Copy link

YangWang92 commented Oct 29, 2024

The quantization format

Hi all,
We have recently designed and open-sourced a new method for Vector Quantization called Vector Post-Training Quantization (VPTQ). Our work is available at VPTQ GitHub repository with the algorithm detailed at this link and models.

VPTQ is a novel Post-Training Quantization method that leverages Vector Quantization to achieve high accuracy on Large Language Models (LLMs) at extremely low bit-widths (<2-bit). Impressively, VPTQ can compress models up to 70B, and even 405B, to 1-2 bits without retraining while maintaining high accuracy.

I am very interested in integrating VPTQ into the aphrodite-engine and other inference frameworks.

Thanks!
Yang

What are its advantages over the existing quantization methods?

VPTQ is a novel Post-Training Quantization method that leverages Vector Quantization to achieve high accuracy on Large Language Models (LLMs) at extremely low bit-widths (<2-bit). Impressively, VPTQ can compress models up to 70B, and even 405B, to 1-2 bits without retraining while maintaining high accuracy. And VPTQ has a lower dequantization (decode) overhead compared with other methods.

image
@YangWang92 YangWang92 changed the title [New Method]: [New Method]: VPTQ, Vector Post-Training Quantization Oct 29, 2024
@AlpinDale
Copy link
Member

Hello! We've already implemented QuIP# (albeit it's been stale for a while due to the lack of use). How similar is to QuIP#? From what I understand, VPTQ is based off of it, right?

@YangWang92
Copy link
Author

Both VPTQ and QuIP# are based on vector quantization, but the design of VPTQ is simpler; it does not require complex Hadamard transformations and only needs a lookup table to achieve lower bit model quantization.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants