[`IterativeTilingAndFusionPass`] Wrap linalg.ops in a loop even if the shape is smaller than min tiling size

In cases where the shape of a linalg operation is smaller or equal to the minimal tile size (which is 32) the operation is untouched and left as it is. That's the problem as our GPU pipeline expects a for-loop (that will later describe a launch grid) after the `IterativeTilingAndFusion` pass. If there's no loop the pipeline breaks.

For the stability reasons, I would expect that such operations would be wrapped into a single-iteration for-loop just to make pipeline working even on those corner cases:

```mlir
func.func @linalg_matmul(%arg0: tensor<32x32xf16>, %arg1: tensor<32x32xf16>,
                         %arg2: tensor<32x32xf16>) -> tensor<32x32xf16> {
  %0 = linalg.matmul ins(%arg0, %arg1 : tensor<32x32xf16>, tensor<32x32xf16>)
                     outs(%arg2 : tensor<32x32xf16>) -> tensor<32x32xf16>
  return %0 : tensor<32x32xf16>
}

// Expected output (a tiling for loop consisting of one iteration):
func.func @linalg_matmul(%arg0: tensor<32x32xf16>, %arg1: tensor<32x32xf16>, %arg2: tensor<32x32xf16>) -> tensor<32x32xf16> {
  %0 = scf.forall (%arg3, %arg4) = (0, 0) to (32, 32) step (32, 32) shared_outs(%arg5 = %arg2) -> (tensor<32x32xf16>) {
    %extracted_slice = tensor.extract_slice %arg0[%arg3, 0] [32, 32] [1, 1] : tensor<32x32xf16> to tensor<32x32xf16>
    %extracted_slice_0 = tensor.extract_slice %arg1[0, %arg4] [32, 32] [1, 1] : tensor<32x32xf16> to tensor<32x32xf16>
    %extracted_slice_1 = tensor.extract_slice %arg5[%arg3, %arg4] [32, 32] [1, 1] : tensor<32x32xf16> to tensor<32x32xf16>
    %1 = linalg.matmul ins(%extracted_slice, %extracted_slice_0 : tensor<32x32xf16>, tensor<32x32xf16>) outs(%extracted_slice_1 : tensor<32x32xf16>) -> tensor<32x32xf16>
    scf.forall.in_parallel {
      tensor.parallel_insert_slice %1 into %arg5[%arg3, %arg4] [32, 32] [1, 1] : tensor<32x32xf16> into tensor<32x32xf16>
    }
  }
  return %0 : tensor<32x32xf16>
}
```

P.S. this is not critical, as in real-life scenarios we would likely not meet ops with such small shapes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[`IterativeTilingAndFusionPass`] Wrap linalg.ops in a loop even if the shape is smaller than min tiling size #332

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[IterativeTilingAndFusionPass] Wrap linalg.ops in a loop even if the shape is smaller than min tiling size #332

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

[`IterativeTilingAndFusionPass`] Wrap linalg.ops in a loop even if the shape is smaller than min tiling size #332