Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Dataflow] Packed systolic matmul test caes #264

Draft
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

jiahanxie353
Copy link

@jiahanxie353 jiahanxie353 commented Nov 24, 2024

Description

This PR tries to add a test case for packed systolic using the Dataflow module

Proposed Solutions

I have added a test case called tests/dataflow/test_packed_systolic.py; but I'm still trying to figure out the indexing problems on this line, which raises:

  File "allo/allo/ir/builder.py", line 1232, in build_Subscript
    return ASTTransformer.build_bit_operation(ctx, node, val=val, idx=idx)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "allo/allo/ir/builder.py", line 1152, in build_bit_operation
    value = build_stmt(ctx, node.value)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "allo/allo/ir/builder.py", line 73, in __call__
    res = method(ctx, node, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "allo/allo/ir/builder.py", line 1228, in build_Subscript
    return ASTTransformer.build_memory_access(ctx, node, val=val, idx=idx)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "allo/allo/ir/builder.py", line 1117, in build_memory_access
    val.results[idx], value.result, ivs, affine_attr, ip=ctx.get_ip()
    ^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'results'

Examples

Example is shown in the tests/dataflow/test_packed_systolic.py. Is there any obvious error in the example?

Checklist

  • PR's title starts with a category (e.g. [Bugfix], [IR], [Builder], etc)
  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage (It would be better to provide ~2 different test cases to test the robustness of your code)
  • Code is well-documented

@chhzh123
Copy link
Member

@jiahanxie353 Thanks for contributing! Please follow this guideline to format your code.

@jiahanxie353
Copy link
Author

@jiahanxie353 Thanks for contributing! Please follow this guideline to format your code.

thanks for getting back @chhzh123 !
can you maybe send another link to the guideline? the one you sent was the homepage of allo

@chhzh123
Copy link
Member

@jiahanxie353
Copy link
Author

https://cornell-zhang.github.io/allo/developer/index.html#id1

thanks! I believe I have passed all the code-format testing

any advice on fixing the issue? I wonder why I can't index into an int16[][] type like this: Z_packed[i - 1, j - 1][k * 8 : (k + 1) * 8]

@chhzh123
Copy link
Member

Maybe you can try separate the memory access -- first load Z_packed[i - 1, j - 1], increment the bits of [k * 8 : (k + 1) * 8], and then store back the new value. You can check out a data packing example here.

Copy link
Member

@chhzh123 chhzh123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you test a larger systolic array? I tested your code with L, D = 4, 4 -- the effective SA size of 2x2, and it failed

import numpy as np

L, D = 2, 2
M, N, K = L, 1 * D, D
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is 1 here? Maybe create a constant variable?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed L, D and only kept M, N, K for clearance. Please let me know if you prefer to keep L, D

Comment on lines +15 to +16
if PP == 2:
np_type = np.int16
allo_type = int16
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also add PP=4, 8 cases?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I added PP=4.
Now I'm hard-coding each basic element size is int8, which forces allo_type == int8 * PP. I will make it parametric and add different PP-allo_type combinations

with allo.meta_elif(j == 0):
# i > 0
for k in range(K):
fifo_A[i, j + 1].put(X_packed[(i - 1) * PP, k])
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line looks suspicious.
X_packed has type allo_type[M, K // PP]. And for example, when M, N, K = 4, 4, 2, X_packed type is allo_type[4, 1]. And if we have a for-loop: for k in range(2). Suppose k = 1, not only it didn't go out of bound when accessing X_packed array, the results are all correct...

maybe we should look into this...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants