Skip to content

Add interpolation kernels (only supports 2x and align_corners=False) #70

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

munkim
Copy link

@munkim munkim commented Apr 15, 2025

This PR adds nki kernel samples for bilinear and trilinear interpolations. As title mentions, these kernels only support upsampling to 2x dimensions with align_corners=False.

In collaboration with @plienhar.

Testing:

Please see detailed unit test requirements in the CONTRIBUTING.md

  • The change is covered by numeric check using nki.baremetal
  • The change is covered by performance benchmark test using nki.benchmark
  • The change is covered by end-to-end integration test

Pull Request Checklist

  • I have filled in all the required field in the template
  • I have tested locally that all the tests pass
  • By submitting this pull request, I confirm that my contribution is made under the terms of the MIT-0 license.

Copy link
Contributor

@JonathanHenson JonathanHenson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in general, let's use mgrid instead of nl.arange, and the block dimension for sbuf and psum tensors is deprecated so the block either needs to be moved to the free dimension, or use them as a list of 2d tiles.

h_end_hbm_dst = 2 * h_end_hbm_src

for p in nl.affine_range(math.ceil(n * c / P_TILE_SIZE)):
out_tile = nl.ndarray([P_TILE_SIZE, h_tile_size_dst, w_dst], dtype=src_arr.dtype, buffer=nl.sbuf)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

block dimension on SBUF is being deprecated. Instead use the free dimension for P_TILE_SIZE, or create a list 2d tiles and use them.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @JonathanHenson !
Thank you for you feedback! By "block dimension on SBUF is being deprecated", do you mean that nl.ndarray on SBUF cannot have more (logical) dimensions than the two physical dimensions any longer? Would you have a code snippet of the two alternatives and/or docs that explains what is being deprecated further?

Just trying to adjust my mental model to what is allowed and no longer allowed so we can adjust how we approach the problem. In the current implementation, we start from a 4D NCHW input tensor. The input tensor is reshaped into a 3D one of shape (N.C, H, W). We then tile along both the first and second dimensions, i.e. we operate on tiles (p_tile_size, h_tile_size, W) with the first dimension mapped to the SBUF partition dimension and the remaining two dimensions mapped to the SBUF free dimension.

out_tile = nl.ndarray([P_TILE_SIZE, h_tile_size_dst, w_dst], dtype=src_arr.dtype, buffer=nl.sbuf)

### Load input array from HBM
i_p = p * P_TILE_SIZE + nl.arange(P_TILE_SIZE)[:, None, None]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use mgrid

### Core region
weight_2d = weight_1d**2

i_p = nl.arange(P_TILE_SIZE)[:, None, None, None, None]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mgrid?

d_start_hbm_dst = 2 * d_start_hbm_src + 1 if d_start_hbm_src else 0
d_end_hbm_dst = 2 * d_end_hbm_src
for p in nl.affine_range(math.ceil(n * c / P_TILE_SIZE)):
out_tile = nl.ndarray([P_TILE_SIZE, d_tile_size_dst, h_dst, w_dst], dtype=src_arr.dtype, buffer=nl.sbuf)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

refactor P_TILE_SIZE to free_dim or use a list of 2d tiles. block dimension is being deprecated

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants