Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for Phi-3 MLP layer #84

Merged

Conversation

SarahByrneIntel
Copy link
Contributor

@SarahByrneIntel SarahByrneIntel commented Jul 2, 2024

Adding support for Phi-3 MLP layer

  • Update compile functionality for model blocks

  • Add Phi-3 MLP optimization

  • Add testing for Phi-3 MLP

  • Add type operation for tensor dtype conversion

  • Implement new forward function for quantized models

  • Add toggling for model profiling

  • Add compiler configuration feature

  • Update tests and examples for compiler config

  • Update doc on usage

@SarahByrneIntel SarahByrneIntel changed the title Sarah/feature/phi3 mlp layer Adding support for Phi-3 MLP layer Jul 17, 2024
@SarahByrneIntel SarahByrneIntel changed the title Adding support for Phi-3 MLP layer Support for Phi-3 MLP layer Jul 17, 2024
Comment on lines 46 to 53
if isinstance(model, Phi3MLP):
# Apply optimizations to a single MLP block model
model = model

if dtype in (int8, int4):
# Quantize model
model = quantize_model(model, dtype)
weights_quantization(model)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why there is a specific branch about Phi3MLP?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If only a single mlp block is passed in to be compiled, we don't want to pass it to the recursive function as it will break it down into the layers. When the block is contained within a larger model, then it is the model that is broken down and we can prevent the blocks being broken down through the NPUModuleWrapper check. However, this won't happen if it is only a single block

Copy link
Contributor

@alessandropalla alessandropalla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few minor things to change, but in general very good

intel_npu_acceleration_library/compiler.py Outdated Show resolved Hide resolved
intel_npu_acceleration_library/compiler.py Outdated Show resolved Hide resolved
@alessandropalla alessandropalla self-requested a review July 19, 2024 12:27
Copy link
Contributor

@alessandropalla alessandropalla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@alessandropalla alessandropalla merged commit 2193535 into intel:main Jul 19, 2024
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants