You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
I am trying to retarget the llm artifacts to my own FPGA board. I'd like to regenerate the HLS code to try more aggressive quantization schemes.
Describe the solution you'd like
Please add some small examples of advanced optimization techniques that are used in the pldi24-artifact repo.
Mixed precision input/output for GEMM
Mixed precision activation/weight for GEMM
Mixed precision input/output for Softmax/Layernorm/Residual
Low-bit packing input/output for GEMM/Softmax
Additional context
For example, the softmax operator requires the same fp32 datatype for both input and output. However, there is a mixed precision HLS implementation with input/output packing in the artifact code here. I searched the Allo repo and could not find a reference of how to generate such code.
The text was updated successfully, but these errors were encountered:
Hi @bibo-msft, thanks for raising the issue! The PLDI'24 artifact was not purely generated by Allo. There exists some manual hacks in the kernel, and we are still automating the process.
Currently, we have a script for generating the Transformer kernels. Please check out this page for the instructions. This test case also shows a low-bit packing example of GEMM. You can change the bitwidths in the type parameters to generate different GEMM kernels.
We will provide additional examples of mixed precision kernels soon and will notify you once they are available. Please feel free to share any other suggestions you may have. Thank you!
Is your feature request related to a problem? Please describe.
I am trying to retarget the llm artifacts to my own FPGA board. I'd like to regenerate the HLS code to try more aggressive quantization schemes.
Describe the solution you'd like
Please add some small examples of advanced optimization techniques that are used in the pldi24-artifact repo.
Additional context
For example, the softmax operator requires the same fp32 datatype for both input and output. However, there is a mixed precision HLS implementation with input/output packing in the artifact code here. I searched the Allo repo and could not find a reference of how to generate such code.
The text was updated successfully, but these errors were encountered: