Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About quantized SOTA model accuracy in AI-HUB website #156

Open
notou10 opened this issue Feb 4, 2025 · 0 comments
Open

About quantized SOTA model accuracy in AI-HUB website #156

notou10 opened this issue Feb 4, 2025 · 0 comments

Comments

@notou10
Copy link

notou10 commented Feb 4, 2025

Hi guys, thanks for great work.
And I have some questions below.

Q1) Though I was able to check the inference speed for SOTA models, I couldn’t find quantitative results (e.g., mAP or similar metrics) for 8-bit quantized models beyond MobileNet in the tutorial (https://app.aihub.qualcomm.com/docs/hub/inference_examples.html). I could find PSNR between 32bit <-> 8bit for output logits on export code, but I’m more interested in task-specific metrics like mAP. Is there any resource where I can find such results for quantized SOTA models? Models incorporating Multi-Head Attention (MHA) would be very helpful.

Q2) I’m working on a custom multi-task model with MHA modules, but I’m encountering significant accuracy drops when quantizing from ONNX to QNN using the QNN SDKs. While I understand some accuracy degradation is expected, none of the techniques I’ve tried (e.g., AIMET AdaRound, QAT, PTQs with various schemes, etc.) have been effective for my model.

It seems that for simpler models like ViT, PTQ is sufficient, but for pre-compiled models like Stable Diffusion v1.5 quantized, it appears additional steps were taken locally before uploading to the hub. Could you clarify what specific techniques or processes were applied during the local pre-compilation stage to achieve better 8-bit accuracy?

ViT

model init -> compile on hub (where PTQ is implemented) -> profile

Image

SD v1.5 quantized

model init -> upload a pre-compiled model on hub -> profile

Image

Image

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant