-
Notifications
You must be signed in to change notification settings - Fork 16
Create MaxPool2D.py #67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
New NKI kernel!
sz_cin, sz_hin, sz_win = in_tensor.shape | ||
sz_hout = (sz_hin + 2*padding - kernel_size) // stride + 1 | ||
sz_wout = (sz_win + 2*padding - kernel_size) // stride + 1 | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's add assertions on expectations for the shape and parameter values here.
sz_p = sz_cin | ||
|
||
# Generate pool index patterns with stride | ||
i0 = nl.arange(sz_p)[:, None, None, None, None] # Channel dim |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's use mgrid for this
i4 = nl.arange(kernel_size)[None, None, None, None, :] # Pool width | ||
|
||
# Load input data | ||
in_tile: tensor[sz_p, sz_hin, sz_win] = nl.load(in_tensor) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
per the docs here, your partition dimension must be the first dimension. These should be 2d tiles. We're deprecating block dimension on SBUF.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JonathanHenson The tiling was based on the average_pool2D example.
nki-samples/src/nki_samples/tutorials/average_pool2d/average_pool2d_nki_kernels.py
Line 41 in 3c5d277
in_tile: tensor[sz_p, sz_hin, sz_win] = nl.load(in_tensor) |
Will that be updated any time soon?
To replace the MaxPool2D function. Not sure if it is faster than a traced pytorch version or not.
However, it does show an interesting use of masking to avoid extra memory writes. (instead of padding with -inf rows and columns on every edge, I just adjust my indices and mask the values for the columns I didn't insert).
All tests are included in the code.
Testing:
Please see detailed unit test requirements in the CONTRIBUTING.md
nki.baremetal
nki.benchmark
Pull Request Checklist