-
Notifications
You must be signed in to change notification settings - Fork 216
Add ADAQUANT quantization scheme #3628
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
136869e to
cb5940e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
overall looks good! I think it makes sense for it to be included as an option in our existing quantization filters. Regarding compression, we will discuss and decide if we will want to keep here or make it more general.
|
My general option is that we can refactor compression as an orthogonal optimisation component. However, due to DDL of my aim (Google Summer of Code), let's do it in later PRs and don't change this one.. |
|
Thanks for your contribution @cyyever. Let's wait until we have the 2.7 release branch before merging this into main. @yanchengnv , @chesterxgchen for viz |
0204f0d to
2717fd1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one question for quantized model and meta size
Signed-off-by: cyy <[email protected]>
Signed-off-by: cyy <[email protected]>
Signed-off-by: cyy <[email protected]>
Signed-off-by: Yuanyuan Chen <[email protected]>
Signed-off-by: Yuanyuan Chen <[email protected]>
This reverts commit 4e8b356.
Description
This PR adds a new quantization scheme: ADAQUANT, as introduced in the paper Opportunistic Block Dropout for Efficiently Training Large-scale Neural Networks through Federated Learning.
ADAQUANT converts float tensors into integer tensors. Combined with an additional compression process to pack low-bit integers, it can reach near 10X quantisation rate, as indicated in the following test results:
These results were reported by running according under
NVFlare/examples/advanced/llm_hfwith the commandTypes of changes
./runtest.sh.