[Question] Omniquant. (AFAIK) scores best for Q. Methods, why no adoption? In any case, is per-tensor quant. best for Mixtral/MoE models? #2247

BuildBackBuehler · 2024-04-29T16:06:57Z

A real 2 for 1 in my question, but they are related...I suppose! I was looking to integrate Omniquant. myself, because it performs about 5% better than MLC quants (I forget what method(s) was used in the user tests I looked at). 5% isn't a lot but considering MLC is the best platform out (in my humble opinion), it performs a whole lot better than vanilla llama.cpp AWQ and the like.

https://github.com/OpenGVLab/OmniQuant/tree/main

It has been around for awhile and was spotlighted by the ICLR 2024 Conference (presentation being given in a week or so, actually so maybe it'll get more recognition then heh).

Of course OQ is used for all models -- no specialization, and while I was looking at how to format OQ to fit into MLC's Quantization I noticed the new per-tensor method added. Looked to me from its script that it is geared towards Mixtral. Just wondering if that's the case and if anyone/MLC has any experience or results to put me in the right direction, thanks!

vinx13 · 2024-04-30T02:12:47Z

per-tensor quantization that was added recently is for fp8, so far we have tested on mixtral and llama and more work such as calibration scale is in progress

BuildBackBuehler added the question Question about the usage label Apr 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question] Omniquant. (AFAIK) scores best for Q. Methods, why no adoption? In any case, is per-tensor quant. best for Mixtral/MoE models? #2247

[Question] Omniquant. (AFAIK) scores best for Q. Methods, why no adoption? In any case, is per-tensor quant. best for Mixtral/MoE models? #2247

BuildBackBuehler commented Apr 29, 2024

vinx13 commented Apr 30, 2024

[Question] Omniquant. (AFAIK) scores best for Q. Methods, why no adoption? In any case, is per-tensor quant. best for Mixtral/MoE models? #2247

[Question] Omniquant. (AFAIK) scores best for Q. Methods, why no adoption? In any case, is per-tensor quant. best for Mixtral/MoE models? #2247

Comments

BuildBackBuehler commented Apr 29, 2024

vinx13 commented Apr 30, 2024