-
Notifications
You must be signed in to change notification settings - Fork 147
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is Tensile adapted to RDNA2 ? #1579
Comments
Yes Tensile has support for RDNA2, assigning this to @TonyYHsieh for further support |
@v01dXYZ Do you still need assistance with this ticket? If not, please close the ticket. Thanks! |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Hello,
As you may know RDNA2 has a 128MB L3 cache which is an important difference with the GCN/CDNA architecture, it allows to use efficiently a memory subsystem with a smaller bus width (although it has a throughput higher than a Vega 10) with 8 Samsung GDDR6 chips (8x32x16Gbps). Are tensile or MISA adapted to a microarchitecture where caching (ie spatial/temporal locality) is central to achieve peak performance ?
Do you think RDNA2 could be as good or even better than a GCN/CDNA architecture for GEMM by conserving as longly as possible blocks in the L3 cache ? As we have 128 MB / 160 wavefronts ~= 800 KB per wavefront (160 wavefronts = 80 CU * 2 concurrent 32-lane wavefronts per CU). It is not far away from the L2 cache we found on CPU (Ryzen 5xxx series: 512 KB L2 cache).
The text was updated successfully, but these errors were encountered: