Gen Tensor/NonTensor Mixed #1736

jeremylt · 2025-01-30T17:11:16Z

Follow-up on #1735

The current parallelization strategy for non-tensor and tensor bases means that we cannot currently mix them in gen backends.

The fix isn't too bad - we need to make a version of the tensor operator that assumes t_id_y == 1 by decomposing t_id_x = a + b * P_1D. Same tensor contractions in 2D, but just different mapping to threads. For 3D we'll need a new template that extends the 2D approach in the natural way instead of using 2D slabs.

Its straightforward, but I wanted to do this separately so the PR for #1735 doesn't get too big.

The text was updated successfully, but these errors were encountered:

jeremylt added CUDA enhancement GPU HIP performance labels Jan 30, 2025

jeremylt mentioned this issue Jan 30, 2025

Nontensor gen operators #1735

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gen Tensor/NonTensor Mixed #1736

Gen Tensor/NonTensor Mixed #1736

jeremylt commented Jan 30, 2025

Gen Tensor/NonTensor Mixed #1736

Gen Tensor/NonTensor Mixed #1736

Comments

jeremylt commented Jan 30, 2025