Support for Phi-3 MLP layer #84

SarahByrneIntel · 2024-07-02T09:27:53Z

Adding support for Phi-3 MLP layer

Update compile functionality for model blocks
Add Phi-3 MLP optimization
Add testing for Phi-3 MLP
Add type operation for tensor dtype conversion
Implement new forward function for quantized models
Add toggling for model profiling
Add compiler configuration feature
Update tests and examples for compiler config
Update doc on usage

…hByrneIntel/intel-npu-acceleration-library into sarah/feature/phi3MLP_layer

intel_npu_acceleration_library/nn/module.py

alessandropalla · 2024-07-17T13:03:56Z

intel_npu_acceleration_library/compiler.py

+        if isinstance(model, Phi3MLP):
+            # Apply optimizations to a single MLP block model
+            model = model
+
+            if dtype in (int8, int4):
+                # Quantize model
+                model = quantize_model(model, dtype)
+                weights_quantization(model)


Why there is a specific branch about Phi3MLP?

If only a single mlp block is passed in to be compiled, we don't want to pass it to the recursive function as it will break it down into the layers. When the block is contained within a larger model, then it is the model that is broken down and we can prevent the blocks being broken down through the NPUModuleWrapper check. However, this won't happen if it is only a single block

alessandropalla

Few minor things to change, but in general very good

intel_npu_acceleration_library/compiler.py

…hByrneIntel/intel-npu-acceleration-library into sarah/feature/phi3MLP_layer

alessandropalla

LGTM

SarahByrneIntel and others added 10 commits July 1, 2024 16:56

Add support for phi-3 MLP layer

22c627c

Updating support for Phi-3 MLP

ea4b27a

Update for Phi-3 MLP testing

39c070c

Merge branch 'main' into sarah/feature/phi3MLP_layer

2042fab

Merge branch 'intel:main' into sarah/feature/phi3MLP_layer

5660cc3

Update for phi-3 mlp layer

727454e

Merge branch 'sarah/feature/phi3MLP_layer' of https://github.com/Sara…

00a64f0

…hByrneIntel/intel-npu-acceleration-library into sarah/feature/phi3MLP_layer

Merge branch 'intel:main' into sarah/feature/phi3MLP_layer

100fe88

Remove old code for phi-3 mlp layer

ea4ea19

Merge branch 'sarah/feature/phi3MLP_layer' of https://github.com/Sara…

53c7b0d

…hByrneIntel/intel-npu-acceleration-library into sarah/feature/phi3MLP_layer

alessandropalla mentioned this pull request Jul 11, 2024

Enable graph mode for LLM inference #89

Open

SarahByrneIntel and others added 6 commits July 12, 2024 09:55

Add type tensor op and quantisation support

1fef8a4

add support for model quantisation and code clean up

cc5d373

Merge branch 'main' into sarah/feature/phi3MLP_layer

ff47c1d

Fix for model quantization

d2fe9fe

Add testing for phi-3 mlp quantisation

b7825e7

Add phi-3 mlp test and enable model profiling toggling

c652859

alessandropalla reviewed Jul 17, 2024

View reviewed changes

intel_npu_acceleration_library/nn/module.py Outdated Show resolved Hide resolved

Update for model profiling toggle

786c663

SarahByrneIntel changed the title ~~Sarah/feature/phi3 mlp layer~~ Adding support for Phi-3 MLP layer Jul 17, 2024

SarahByrneIntel changed the title ~~Adding support for Phi-3 MLP layer~~ Support for Phi-3 MLP layer Jul 17, 2024

alessandropalla reviewed Jul 17, 2024

View reviewed changes

SarahByrneIntel and others added 6 commits July 18, 2024 08:40

Add compile config feature

003d639

Fix test for compile config and remove old code

c63c223

Fix tests with compile config

e652eaa

Fix for compiler, updates for tests and examples, doc update

7f2faf9

Update for model examples and remove test code

4b5f857

Merge branch 'main' into sarah/feature/phi3MLP_layer

2718e13

alessandropalla requested changes Jul 19, 2024

View reviewed changes

intel_npu_acceleration_library/compiler.py Outdated Show resolved Hide resolved

intel_npu_acceleration_library/compiler.py Outdated Show resolved Hide resolved

Fix for quantization and remove unused code

ae1fd61

SarahByrneIntel added 2 commits July 19, 2024 11:07

Merge branch 'sarah/feature/phi3MLP_layer' of https://github.com/Sara…

5d578a1

…hByrneIntel/intel-npu-acceleration-library into sarah/feature/phi3MLP_layer

Update for quantization of a model

2890299

alessandropalla self-requested a review July 19, 2024 12:27

alessandropalla approved these changes Jul 19, 2024

View reviewed changes

alessandropalla merged commit 2193535 into intel:main Jul 19, 2024
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for Phi-3 MLP layer #84

Support for Phi-3 MLP layer #84

SarahByrneIntel commented Jul 2, 2024 •

edited

Loading

alessandropalla Jul 17, 2024

SarahByrneIntel Jul 17, 2024

alessandropalla left a comment

alessandropalla left a comment

Support for Phi-3 MLP layer #84

Support for Phi-3 MLP layer #84

Conversation

SarahByrneIntel commented Jul 2, 2024 • edited Loading

alessandropalla Jul 17, 2024

Choose a reason for hiding this comment

SarahByrneIntel Jul 17, 2024

Choose a reason for hiding this comment

alessandropalla left a comment

Choose a reason for hiding this comment

alessandropalla left a comment

Choose a reason for hiding this comment

SarahByrneIntel commented Jul 2, 2024 •

edited

Loading