Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement RoIAlign #3493

Open
wants to merge 26 commits into
base: develop
Choose a base branch
from
Open

Implement RoIAlign #3493

wants to merge 26 commits into from

Conversation

anhskrttt
Copy link
Collaborator

  • Add RoIAlign operation for both forward and backward.
  • Add driver and gtest.
  • Performance condition:
    • For RoIAlignForward, MIOpen performs better if input and output tensors are contiguous.
      • ROCm pytorch's RoIAlign doesn't support for bfloat16 yet, so this is an always winning case.
    • For RoIAlignBackward, MIOpen performs better if:
      • Input tensor is float16 type
      • (Input tensors are contiguous) or (Input tensors are noncontiguous but input_numel size is smaller)

Average improvement over ROCm

type fwd bwd
float 6.24 0.97
float16 6.32 1.21

Detail Benchmark

fp32_fwd
input_shape rois_shape output_shape is_contiguous spatial_scale sampling_ratio ROCm MIOpen improvement
[16 256 160 160] [6 5] [7 7] TRUE 0.0625 -1 838074 11022 76.04
[6 256 200 320] [6 5] [7 7] TRUE 0.0625 -1 738074 13866 53.23
[16 256 160 160] [6 5] [7 7] TRUE 0.25 2 847994 16337 51.91
[6 256 200 320] [6 5] [7 7] TRUE 0.25 -1 755995 33812 22.36
[16 256 160 160] [6 5] [28 28] TRUE 0.0625 -1 875194 66168 13.23
[6 256 200 320] [6 5] [28 28] TRUE 0.25 -1 780314 76088 10.26
[16 256 160 160] [6 5] [28 28] TRUE 0.25 2 905273 112017 8.08
[6 256 200 320] [6 5] [28 28] TRUE 0.0625 2 797754 111875 7.13
[16 256 160 160] [512 5] [7 7] TRUE 0.0625 -1 1028473 347748 2.96
[6 256 200 320] [512 5] [7 7] TRUE 0.0625 2 1098552 575622 1.91
[16 256 160 160] [512 5] [7 7] TRUE 1 2 1881266 1121320 1.68
[1 1 64 64] [512 5] [7 7] TRUE 0.25 -1 16960 12817 1.32
[6 1 800 1060] [512 5] [7 7] TRUE 0.25 2 20480 14684 1.39
[6 256 200 320] [512 5] [7 7] TRUE 0.25 -1 1757267 1468540 1.20
[1 3 96 96] [6 5] [28 28] TRUE 0.25 -1 12480 10062 1.24
[6 256 200 320] [2149 5] [7 7] TRUE 0.25 2 2980138 2481650 1.20
[1 1 64 64] [512 5] [7 7] TRUE 0.0625 2 17760 14897 1.19
fp16_fwd
input_shape rois_shape output_shape is_contiguous spatial_scale sampling_ratio ROCm MIOpen improvement
[16 256 160 160] [6 5] [7 7] contiguous 0.0625 -1 773275 10631 72.74
[6 256 200 320] [6 5] [7 7] contiguous 0.0625 -1 687035 12906 53.23
[16 256 160 160] [6 5] [7 7] contiguous 0.0625 2 775515 14950 51.87
[16 256 160 160] [6 5] [7 7] contiguous 0.25 2 784474 15430 50.84
[6 256 200 320] [6 5] [7 7] contiguous 0.0625 2 691834 14577 47.46
[16 256 160 160] [6 5] [7 7] contiguous 1 2 777275 18133 42.87
[6 256 200 320] [6 5] [7 7] contiguous 0.25 2 692795 16248 42.64
[16 256 160 160] [6 5] [7 7] contiguous 0.25 -1 779514 21262 36.66
[6 256 200 320] [6 5] [7 7] contiguous 1 2 704475 21741 32.40
[6 256 200 320] [6 5] [7 7] contiguous 0.25 -1 699675 32070 21.82
[16 256 160 160] [6 5] [28 28] contiguous 0.0625 -1 805754 65155 12.37
[16 256 160 160] [6 5] [28 28] contiguous 0.25 -1 806394 65386 12.33
fp16_bwd
input_size rois_size output_size contiguous spatial_scale sampling_ratio ROCm MIOpen Improvement
[3 1 800 1060] [512 5] [28 28] TRUE 0.25 2 3776773 1101540 3.43
[1 1 800 1201] [512 5] [28 28] TRUE 0.25 2 4223650 1260380 3.35
[2 1 800 1060] [512 5] [28 28] TRUE 0.25 2 2998538 1017500 2.95
[1 1 800 1201] [6 5] [28 28] TRUE 0.0625 -1 1359510 475799 2.86
[3 1 800 1060] [512 5] [28 28] TRUE 1 2 1239991 446023 2.78
[1 1 64 64] [6 5] [28 28] FALSE 0.0625 -1 5180762 1944090 2.66
[1 1 64 64] [6 5] [28 28] TRUE 0.0625 -1 4979164 1887840 2.64
[1 1 64 64] [6 5] [28 28] FALSE 0.25 2 4175810 1677920 2.49
[1 1 800 1201] [512 5] [28 28] TRUE 1 2 1037592 455659 2.28
[6 1 800 1060] [2149 5] [28 28] TRUE 0.25 2 7006829 3170180 2.21
[6 256 200 320] [512 5] [28 28] TRUE 1 2 113196296 68001300 1.66
[6 1 800 1060] [512 5] [28 28] TRUE 0.25 2 1637108 993130 1.65
[1 3 96 96] [512 5] [28 28] FALSE 0.25 -1 11071280 6859840 1.61
[2 1 800 1060] [512 5] [28 28] TRUE 1 2 689115 427890 1.61
[1 1 64 64] [2149 5] [28 28] TRUE 0.25 -1 38307082 23917300 1.60
[1 3 96 96] [2149 5] [28 28] TRUE 0.0625 -1 39848670 25024400 1.59
[1 1 64 64] [2149 5] [28 28] FALSE 0.0625 2 14476410 10476100 1.38
[1 3 96 96] [512 5] [7 7] FALSE 1 2 1018780 743758 1.37
[6 1 800 1060] [2149 5] [28 28] TRUE 0.0625 -1 4765470 3529800 1.35
[1 3 96 96] [512 5] [28 28] TRUE 0.0625 -1 5826490 4406120 1.32
[1 1 64 64] [512 5] [28 28] FALSE 0.25 -1 3054872 2344600 1.30

@@ -39,3 +39,4 @@ The MIOpen API library is structured as follows:
* :doc:`ReLU <../doxygen/html/group___re_l_u>` (experimental)
* :doc:`Kthvalue <../doxygen/html/group__kthvalue>` (experimental)
* :doc:`GLU <../doxygen/html/group__glu>` (experimental)
* :doc:`RoIAlign <../doxygen/html/group__RoIAlign>` (experimental)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* :doc:`RoIAlign <../doxygen/html/group__RoIAlign>` (experimental)
* :doc:`RoIAlign <../doxygen/html/group___ro_i_align>` (experimental)

@anhskrttt anhskrttt self-assigned this Feb 10, 2025
@amd-jnovotny
Copy link
Contributor

@anhskrttt Do we need a changelog entry for this addition?

@anhskrttt
Copy link
Collaborator Author

anhskrttt commented Feb 10, 2025

Do we need a changelog entry for this addition?

@amd-jnovotny I actually couldn't find any MIOpen documentation specifying which types of additions require a changelog entry...? I noticed that similar ops, such as PReLU and GLU, have been included in the changelog here. Based on that, I think this op could also be added.

cc @long10024070 Do you see any issues with adding this op to the changelog?

@long10024070
Copy link
Collaborator

long10024070 commented Feb 10, 2025

@anhskrttt Do we need a changelog entry for this addition?

@amd-jnovotny I think editing changelog should be totally handled from AMD side. It is not our decision.

@amd-jnovotny
Copy link
Contributor

@long10024070 thanks for the feedback. @BrianHarrisonAMD Should we add an entry to the MIOpen changelog? Maybe as part of a separate PR? If you can add something, I can review it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants