Background
During the implementation of grouped matmul, it is necessary to transfer partial data from L0C to GM.
Root Cause
The current copy interface derives the data transfer size directly from the src shape and ignores the extent.
Proposal
Update the copy interface to support slice syntax for the src argument to enable partial data transfer, or add a default parameter to specify the copy size.