Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

src:cpu:aarch64 restricts the conv operation to just one post-op when… #1689

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

renato-arantes
Copy link
Contributor

@renato-arantes renato-arantes commented Jul 20, 2023

Description

When using fp16 on AArch64 there are multiple failed conv tests due to a precision problem in oneDNN's fp16 operations. OneDNN conducts all fp16 post-op operations in fp32 and then downgrades to fp16 at the end, leading to a rounding error. To resolve this issue, this patch limits the conv operation to a single post-op when the src/wei/dst data type is fp16.

Checklist

General

  • Do all unit and benchdnn tests (make test and make test_benchdnn_*) pass locally for each commit?
  • Have you formatted the code using clang-format?

@mgouicem
Copy link
Contributor

mgouicem commented Jul 24, 2023

Hi @renato-arantes thanks for the patch.
Could you

  • share bug instances this patches fixes (maybe a few benchdnn repro lines)
  • describe what is the rootcause for the issue you are fixing?

@renato-arantes
Copy link
Contributor Author

renato-arantes commented Jul 25, 2023

Hi @mgouicem

The root cause is that OneDNN conducts all fp16 post-op operations in fp32 and then downgrades to fp16 at the end, leading to a rounding error. To bypass this issue, we decided to limit the conv operation to a single post-op when the src/wei/dst data type is fp16.

Here are two benchdnn repro lines that result in an error before the patch:

  • --conv --skip-impl=ref --allow-enum-tags-only=false --dt=f16:f16:f16 --dtag=acdb --attr-post-ops=tanh+sum:1:0:f16 mb1ic1ih10oc3oh10kh3ph1n

  • --conv --skip-impl=ref --allow-enum-tags-only=false --dt=f16:f16:f16 --attr-post-ops=sum:1:0:f16+mul:f16:per_oc+add:f16:per_oc+relu mb1ic128ih38oc128oh38kh3ph1n

I have several more if you need it.

@mgouicem
Copy link
Contributor

mgouicem commented Jul 25, 2023

Hi @renato-arantes

The root cause is that OneDNN conducts all fp16 post-op operations in fp32 and then downgrades to fp16 at the end, leading to a rounding error.

This is the expected behavior in oneDNN, and this is exactly what benchdnn checks. I just confirmed that these two tests pass on x64 implementation, which has that exact behavior (computing post-ops in f32).

To bypass this issue, we decided to limit the conv operation to a single post-op when the src/wei/dst data type is fp16.

Could you clarify how this remove the f32->f16 downconversion? I would expect that f32->f16 conversion to be here independently of the number of post-ops since convolution accumulates to f32 anyway.

@renato-arantes
Copy link
Contributor Author

renato-arantes commented Jul 25, 2023

Hi @mgouicem,

This is the expected behavior in oneDNN, and this is exactly what benchdnn checks. I just confirmed that these two tests pass on x64 implementation, which has that exact behavior (computing post-ops in f32).

Both fail for me as this patch is for aarch64 and not for x64! You can check that my patch only changes files under src/cpu/aarch64.

Here is the benchdnn output for the first one:

./tests/benchdnn/benchdnn --conv --skip-impl=ref --allow-enum-tags-only=false --dt=f16:f16:f16 --dtag=acdb --attr-post-ops=tanh+sum:1:0:f16 mb1ic1ih10oc3oh10kh3ph1n
[ 2][DST][0:0:0:2] exp_f32:-1.66893e-06 exp:-1.66893e-06 got: -1 diff:0.999998 rdiff: 599185
[ 7][DST][0:0:0:7] exp_f32: 2.00495 exp: 2.00586 got: 2.00391 diff:0.00195312 rdiff:0.00097371
[ 17][DST][0:0:1:7] exp_f32: -2.99505 exp: -2.99414 got: -2.99609 diff:0.00195312 rdiff:0.000652316
[ 101][DST][0:1:0:1] exp_f32: -2 exp: -2 got: -1 diff: 1 rdiff: 0.5
[ 102][DST][0:1:0:2] exp_f32: 2 exp: 2 got: 3 diff: 1 rdiff: 0.5
[ 105][DST][0:1:0:5] exp_f32: -2.00495 exp: -2.00586 got: -2.00391 diff:0.00195312 rdiff:0.00097371
[ 144][DST][0:1:4:4] exp_f32: -3.00495 exp: -3.00586 got: -3.00391 diff:0.00195312 rdiff:0.000649773
[ 156][DST][0:1:5:6] exp_f32: 2.99505 exp: 2.99414 got: 2.99609 diff:0.00195312 rdiff:0.000652316
[ 190][DST][0:1:9:0] exp_f32: 2.99505 exp: 2.99414 got: 2.99609 diff:0.00195312 rdiff:0.000652316
0:FAILED (errors:16 total:300) __REPRO: --conv --skip-impl=ref --allow-enum-tags-only=false --dt=f16:f16:f16 --dtag=acdb --attr-post-ops=tanh+sum:1:0:f16 mb1ic1ih10oc3oh10kh3ph1
tests:1 passed:0 skipped:0 mistrusted:0 unimplemented:0 invalid_arguments:0 failed:1 listed:0

I built oneDNN from the github/master and the top commit is 5f9ff5d9ab3461047222a6e71a4e7bad0e838caf.

@renato-arantes
Copy link
Contributor Author

renato-arantes commented Jul 25, 2023

Hi @mgouicem,

Could you clarify how this remove the f32->f16 downconversion? I would expect that f32->f16 conversion to be here independently of the number of post-ops since convolution accumulates to f32 anyway.

ACL does all post-op in f16 when demanded by --dt=f16:f16:f16, and when we have more than one post-op sometimes the results are different from oneDNN due to oneDNN downcast.

This patch doesn't remove the down conversation. But in the case of aarch64, limits the number of post-op operations to just one, so in this case, the probability of having a failure comparison with the ACL result due to the f32->f16 downcast is lower. Currently, I haven't seen any test with just one post-op that fails.

@mgouicem
Copy link
Contributor

Both fail for me as this patch is for aarch64 and not for x64! You can check that my patch only changes files under src/cpu/aarch64.

Sorry if my point was not clear. What I meant is that if post-ops are run in f32, and the down-conversion happens only once before writting to dst (like in x64 implementation), then there should be no failure in benchdnn for those test cases.

ACL does all post-op in f16 when demanded by --dt=f16:f16:f16, and when we have more than one post-op sometimes the results are different from oneDNN due to oneDNN downcast.

That makes sense that the issue would be post-ops using f16 instead of f32. There were a few discussions on this topic, and the plan was to rely on f32 post-ops for ACL (see this PR and this PR). Was there any change with respect to this?

This patch doesn't remove the down conversation. But in the case of aarch64, limits the number of post-op operations to just one, so in this case, the probability of having a failure comparison with the ACL result due to the f32->f16 downcast is lower. Currently, I haven't seen any test with just one post-op that fails.

Hence my concern: this PR does not seem to address the actual issue (the use of f16 in postops), but mostly hides the issue (benchdnn uses a higher threshold when post-ops are used, so with 1 post-ops, not much error is compounded and the test passes).

@igorsafo
Copy link
Contributor

Discussed this PR with @milpuz01 , @dzarukin and @cfRod :

  • The desicion is to keep post-ops supported until an RFC about fp16 post-ops is ready and implemented.
    • Switching off post-ops doesn't solve the issue but might introduce performance degradation on user's side.
    • Failing benchdnn tests are fine as long as it is a known issue that will be addressed at some point.

@vpirogov
Copy link
Member

API to manage accumulation mode is now merged to main branch. Please decide whether you want to continue working on this PR or open a new one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants