-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for Phi-3 MLP layer #84
Support for Phi-3 MLP layer #84
Conversation
…hByrneIntel/intel-npu-acceleration-library into sarah/feature/phi3MLP_layer
…hByrneIntel/intel-npu-acceleration-library into sarah/feature/phi3MLP_layer
if isinstance(model, Phi3MLP): | ||
# Apply optimizations to a single MLP block model | ||
model = model | ||
|
||
if dtype in (int8, int4): | ||
# Quantize model | ||
model = quantize_model(model, dtype) | ||
weights_quantization(model) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why there is a specific branch about Phi3MLP?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If only a single mlp block is passed in to be compiled, we don't want to pass it to the recursive function as it will break it down into the layers. When the block is contained within a larger model, then it is the model that is broken down and we can prevent the blocks being broken down through the NPUModuleWrapper check. However, this won't happen if it is only a single block
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few minor things to change, but in general very good
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Adding support for Phi-3 MLP layer
Update compile functionality for model blocks
Add Phi-3 MLP optimization
Add testing for Phi-3 MLP
Add type operation for tensor dtype conversion
Implement new forward function for quantized models
Add toggling for model profiling
Add compiler configuration feature
Update tests and examples for compiler config
Update doc on usage