Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is Tensile adapted to RDNA2 ? #1579

Closed
v01dXYZ opened this issue Aug 29, 2022 · 2 comments
Closed

Is Tensile adapted to RDNA2 ? #1579

v01dXYZ opened this issue Aug 29, 2022 · 2 comments
Assignees

Comments

@v01dXYZ
Copy link

v01dXYZ commented Aug 29, 2022

Hello,
As you may know RDNA2 has a 128MB L3 cache which is an important difference with the GCN/CDNA architecture, it allows to use efficiently a memory subsystem with a smaller bus width (although it has a throughput higher than a Vega 10) with 8 Samsung GDDR6 chips (8x32x16Gbps). Are tensile or MISA adapted to a microarchitecture where caching (ie spatial/temporal locality) is central to achieve peak performance ?
Do you think RDNA2 could be as good or even better than a GCN/CDNA architecture for GEMM by conserving as longly as possible blocks in the L3 cache ? As we have 128 MB / 160 wavefronts ~= 800 KB per wavefront (160 wavefronts = 80 CU * 2 concurrent 32-lane wavefronts per CU). It is not far away from the L2 cache we found on CPU (Ryzen 5xxx series: 512 KB L2 cache).

@bragadeesh
Copy link
Contributor

Yes Tensile has support for RDNA2, assigning this to @TonyYHsieh for further support

@ppanchad-amd
Copy link

@v01dXYZ Do you still need assistance with this ticket? If not, please close the ticket. Thanks!

@v01dXYZ v01dXYZ closed this as completed Aug 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants