TinyEngine convolutional layer has greater latency than ARM's CMSIS-NN #71

ellial · 2023-04-02T07:00:11Z

Hello,

I was measuring the latency on one of TinyEngine's convolutional kernels (convolve_s8_kernel3_stride1_pad1) versus CMSIS-NN's fast convolutional kernel (arm_convolve_HWC_q7_fast). The TinyEngine kernel had a latency of appx. 200000 cycles while the CMSIS kernel had a latency of appx. 130000 cycles.

Is the additional overhead due to the per channel requantization of Tiny Engine? Could you explain why per channel requantization is needed in the kernel?
Have you tried benchmarking the latencies of the frameworks per kernel? If so, could you share the results?

Thank you in advance.

meenchen · 2023-04-04T17:07:59Z

Hi @ellial,

convolve_s8_kernel3_stride1_pad1 is a deprecated kernel and not actively used in TinyEngine. For 3x3 convolution kernel, we use https://github.com/mit-han-lab/tinyengine/blob/main/TinyEngine/src/kernels/int_forward_op/convolve_u8_kernel3_inputch3_stride2_pad1.c instead. Please also note for mobilenet-like models, most computation goes to pointwise and depthwise convolutions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TinyEngine convolutional layer has greater latency than ARM's CMSIS-NN #71

TinyEngine convolutional layer has greater latency than ARM's CMSIS-NN #71

ellial commented Apr 2, 2023 •

edited

meenchen commented Apr 4, 2023

TinyEngine convolutional layer has greater latency than ARM's CMSIS-NN #71

TinyEngine convolutional layer has greater latency than ARM's CMSIS-NN #71

Comments

ellial commented Apr 2, 2023 • edited

meenchen commented Apr 4, 2023

ellial commented Apr 2, 2023 •

edited