You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi all,
We have recently designed and open-sourced a new method for Vector Quantization called Vector Post-Training Quantization (VPTQ). Our work is available at VPTQ GitHub repository with the algorithm detailed at this link and models.
VPTQ is a novel Post-Training Quantization method that leverages Vector Quantization to achieve high accuracy on Large Language Models (LLMs) at extremely low bit-widths (<2-bit). Impressively, VPTQ can compress models up to 70B, and even 405B, to 1-2 bits without retraining while maintaining high accuracy.
I am very interested in integrating VPTQ into the aphrodite-engine and other inference frameworks.
Thanks!
Yang
What are its advantages over the existing quantization methods?
VPTQ is a novel Post-Training Quantization method that leverages Vector Quantization to achieve high accuracy on Large Language Models (LLMs) at extremely low bit-widths (<2-bit). Impressively, VPTQ can compress models up to 70B, and even 405B, to 1-2 bits without retraining while maintaining high accuracy. And VPTQ has a lower dequantization (decode) overhead compared with other methods.
The text was updated successfully, but these errors were encountered:
YangWang92
changed the title
[New Method]:
[New Method]: VPTQ, Vector Post-Training Quantization
Oct 29, 2024
Hello! We've already implemented QuIP# (albeit it's been stale for a while due to the lack of use). How similar is to QuIP#? From what I understand, VPTQ is based off of it, right?
Both VPTQ and QuIP# are based on vector quantization, but the design of VPTQ is simpler; it does not require complex Hadamard transformations and only needs a lookup table to achieve lower bit model quantization.
The quantization format
Hi all,
We have recently designed and open-sourced a new method for Vector Quantization called Vector Post-Training Quantization (VPTQ). Our work is available at VPTQ GitHub repository with the algorithm detailed at this link and models.
VPTQ is a novel Post-Training Quantization method that leverages Vector Quantization to achieve high accuracy on Large Language Models (LLMs) at extremely low bit-widths (<2-bit). Impressively, VPTQ can compress models up to 70B, and even 405B, to 1-2 bits without retraining while maintaining high accuracy.
I am very interested in integrating VPTQ into the aphrodite-engine and other inference frameworks.
Thanks!
Yang
What are its advantages over the existing quantization methods?
VPTQ is a novel Post-Training Quantization method that leverages Vector Quantization to achieve high accuracy on Large Language Models (LLMs) at extremely low bit-widths (<2-bit). Impressively, VPTQ can compress models up to 70B, and even 405B, to 1-2 bits without retraining while maintaining high accuracy. And VPTQ has a lower dequantization (decode) overhead compared with other methods.
The text was updated successfully, but these errors were encountered: