Hi,
It seems to me that example_gemm_rs.py is incomplete. It calls into gemm_rs_op, however the actual implementation (gemm_rs_producer_persistent/gemm_rs_producer_non_persistent) are missing/commented out.
It'd be great to provide a full example of GEMM+RS, since it is a common pattern in distributed LLM training/inference. Do we have a roadmap for this?
Thanks.