Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TinyEngine convolutional layer has greater latency than ARM's CMSIS-NN #71

Open
ellial opened this issue Apr 2, 2023 · 1 comment
Open

Comments

@ellial
Copy link

ellial commented Apr 2, 2023

Hello,

I was measuring the latency on one of TinyEngine's convolutional kernels (convolve_s8_kernel3_stride1_pad1) versus CMSIS-NN's fast convolutional kernel (arm_convolve_HWC_q7_fast). The TinyEngine kernel had a latency of appx. 200000 cycles while the CMSIS kernel had a latency of appx. 130000 cycles.

  • Is the additional overhead due to the per channel requantization of Tiny Engine? Could you explain why per channel requantization is needed in the kernel?
  • Have you tried benchmarking the latencies of the frameworks per kernel? If so, could you share the results?

Thank you in advance.

@meenchen
Copy link
Contributor

meenchen commented Apr 4, 2023

Hi @ellial,

convolve_s8_kernel3_stride1_pad1 is a deprecated kernel and not actively used in TinyEngine. For 3x3 convolution kernel, we use https://github.com/mit-han-lab/tinyengine/blob/main/TinyEngine/src/kernels/int_forward_op/convolve_u8_kernel3_inputch3_stride2_pad1.c instead. Please also note for mobilenet-like models, most computation goes to pointwise and depthwise convolutions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants