Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] Omniquant. (AFAIK) scores best for Q. Methods, why no adoption? In any case, is per-tensor quant. best for Mixtral/MoE models? #2247

Open
BuildBackBuehler opened this issue Apr 29, 2024 · 1 comment
Labels
question Question about the usage

Comments

@BuildBackBuehler
Copy link

A real 2 for 1 in my question, but they are related...I suppose! I was looking to integrate Omniquant. myself, because it performs about 5% better than MLC quants (I forget what method(s) was used in the user tests I looked at). 5% isn't a lot but considering MLC is the best platform out (in my humble opinion), it performs a whole lot better than vanilla llama.cpp AWQ and the like.

https://github.com/OpenGVLab/OmniQuant/tree/main

It has been around for awhile and was spotlighted by the ICLR 2024 Conference (presentation being given in a week or so, actually so maybe it'll get more recognition then heh).

Of course OQ is used for all models -- no specialization, and while I was looking at how to format OQ to fit into MLC's Quantization I noticed the new per-tensor method added. Looked to me from its script that it is geared towards Mixtral. Just wondering if that's the case and if anyone/MLC has any experience or results to put me in the right direction, thanks!

@BuildBackBuehler BuildBackBuehler added the question Question about the usage label Apr 29, 2024
@vinx13
Copy link
Member

vinx13 commented Apr 30, 2024

per-tensor quantization that was added recently is for fp8, so far we have tested on mixtral and llama and more work such as calibration scale is in progress

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Question about the usage
Projects
None yet
Development

No branches or pull requests

2 participants