Introduces support for 3D and 4D inputs in ACL acl_lowp and acl_lowp_sq matmul. #2846
+133
−55
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
This PR introduces 3D and 4D support for acl_lowp_matmul_t and acl_lowp_sq_matmul_t in ACL. Solves issue.
General
make test
andmake test_benchdnn_*
) pass locally for each commit?Performance improvements
Below are small test case logs that demonstrate performance numbers before and after.
OMP_NUM_THREADS=16 ./tests/benchdnn/benchdnn --mode=pc --matmul --dt=s8:s8:f32 4x1024x1024:1x1024x1024
ACL: total perf: min(ms):1.53076 avg(ms):1.55271
JIT: total perf: min(ms):1029.39 avg(ms):1029.92
OMP_NUM_THREADS=16 ./tests/benchdnn/benchdnn --mode=pc --matmul --dt=s8:s8:s8 4x1024x1024:1x1024x1024
ACL: total perf: min(ms):1.38965 avg(ms):1.4086
JIT: total perf: min(ms):1032.73 avg(ms):1033.18
OMP_NUM_THREADS=16 ./tests/benchdnn/benchdnn --mode=pc --matmul --dt=s8:s8:f32 4x4x1024x1024:1x1x1024x1024
ACL: total perf: min(ms):5.95874 avg(ms):5.9957
JIT: total perf: min(ms):4003.9 avg(ms):4005.17
OMP_NUM_THREADS=16 ./tests/benchdnn/benchdnn --mode=pc --matmul --dt=s8:s8:s8 4x4x1024x1024:1x1x1024x1024
ACL: total perf: min(ms):5.31323 avg(ms):5.34218
JIT: total perf: min(ms):4022.92 avg(ms):4023.37