Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Z: Accelerate ArraysSupport.vectorizedHashCode #19455

Merged
merged 1 commit into from
May 30, 2024

Conversation

Spencer-Comin
Copy link
Contributor

ArraysSupport.vectorizedHashCode [1] is an intrinsic candidate introduced in JDK21. The hash function used is the same as the one used for hashing Strings, so this change leverages that by recognizing ArraysSupport.vectorizedHashCode and using the preexisting accelerated implementation of String.hashCode.

[1] https://github.com/ibmruntimes/openj9-openjdk-jdk21/blob/openj9/src/java.base/share/classes/jdk/internal/util/ArraysSupport.java#L200

@Spencer-Comin
Copy link
Contributor Author

Internal Jenkins sanity test: https://hyc-runtimes-jenkins.swg-devops.com/view/OpenJ9%20-%20Personal/job/Pipeline-Build-Test-Personal/21953/

The only failure I see is a JITServer/CRIU failure that look to be unrelated

@Spencer-Comin
Copy link
Contributor Author

@r30shah fyi

Copy link
Contributor

@r30shah r30shah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to add some brief information about the changes in the commit message i.e, recognize and accelerate array hashcode generation using SIMD instruction?

@@ -97,6 +97,7 @@ J9::Z::CodeGenerator::initialize()
!TR::Compiler->om.canGenerateArraylets())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we rename the JIT option TR_DisableSIMDStringHashCode as it is now used for vectorizedHashCode as well. I think it makes sense to rename to TR_DisableSIMDHashCode.
And in place where we are recognizing the method in code-gen, we can use the environment flag to disable the String or Array hashcode acceleration (Before suppressing the inlining through traditional inliner).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I also rename the code generator flag SupportsInlineStringHashCode to SupportsInlineHashCode and use that instead of adding a SupportsInlineVectorizedHashCode flag?

The TR_DisableSIMDStringHashCode option is defined in OMR, it might make more sense to migrate it to just be in OpenJ9, and rename it as part of that change.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the advantage over having that option in OMR was that it can be applied to selective method, this may not be something we would want when we move that to OpenJ9

runtime/compiler/z/codegen/J9TreeEvaluator.cpp Outdated Show resolved Hide resolved
runtime/compiler/z/codegen/J9TreeEvaluator.cpp Outdated Show resolved Hide resolved
Copy link
Contributor

@r30shah r30shah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor changes, overall looks good to me. Once you make those cosmetic changes, will launch builds

Adds ArraysSupport.vectorizedHashCode as a recognized method and accelerate
with a SIMD instruction sequence. The hash function is the same as
String.hashCode, so the instruction generation logic is commoned between the
two into a helper function.

Signed-off-by: Spencer Comin <[email protected]>
@Spencer-Comin
Copy link
Contributor Author

Here's a couple instruction selection logs:

Integer array

------------------------------
 n9n      (  0)  treetop                                                                              [     0x3ff48f19910] bci=[-1,6,40] rc=0 vc=106 vn=- li=2 udi=- nc=1
 n8n      (  1)    icall  jdk/internal/util/ArraysSupport.vectorizedHashCode(Ljava/lang/Object;IIII)I[#408  static Method] [flags 0x500 0x0 ] (in GPR_0046) ()  [     0x3ff48f198c0] bci=[-1,6,40] rc=1 vc=106 vn=- li=2 udi=- nc=5 flg
 n3n      (  0)      aload  <parm 0 Ljava/lang/Object;>[#399  Parm] [flags 0xc0000107 0x0 ] (in &GPR_0032)  [     0x3ff48f19730] bci=[-1,0,40] rc=0 vc=106 vn=- li=2 udi=- nc=0
 n4n      (  0)      iload  <parm 1 I>[#400  Parm] [flags 0xc0000103 0x0 ] (in GPR_0036) (cannotOverflow )  [     0x3ff48f19780] bci=[-1,1,40] rc=0 vc=106 vn=- li=2 udi=- nc=0 flg=0x1000
 n5n      (  0)      iload  <parm 2 I>[#401  Parm] [flags 0xc0000103 0x0 ] (in GPR_0039) (cannotOverflow )  [     0x3ff48f197d0] bci=[-1,2,40] rc=0 vc=106 vn=- li=2 udi=- nc=0 flg=0x1000
 n6n      (  0)      iload  <parm 3 I>[#402  Parm] [flags 0xc0000103 0x0 ] (in GPR_0046) (cannotOverflow )  [     0x3ff48f19820] bci=[-1,3,40] rc=0 vc=106 vn=- li=2 udi=- nc=0 flg=0x1000
 n7n      (  0)      iconst 10 (X!=0 X>=0 )                                                           [     0x3ff48f19870] bci=[-1,4,40] rc=0 vc=106 vn=- li=2 udi=- nc=0 flg=0x104
------------------------------

LG      &GPR_0032, Parm[<parm 0 Ljava/lang/Object;>] ?+0(GPR5)
L       GPR_0036, Parm[<parm 1 I>] ?+8(GPR5)
L       GPR_0039, Parm[<parm 2 I>] ?+16(GPR5)
LGFR    GPR_0036,GPR_0036
LGFR    GPR_0039,GPR_0039
L       GPR_0046, Parm[<parm 3 I>] ?+24(GPR5)
Label L0032:     # (Start of internal control flow)
SLLG    GPR_0036,GPR_0036,2
SLLG    GPR_0039,GPR_0039,2
LA      GPR_0043,#410 0(GPR_0039,GPR_0036)
CLGIJ   GPR_0039,Label L0036,16,BM(mask=0x4),
Label L0033:
SLGFI   GPR_0043,12
LARL    GPR_0047, &<LiteralPool Base Address>     ; LoadLitPool
VL      VRF_0040,#411 =X(0) 0(GPR_0047),0
VGBM    VRF_0041,0x0
VLVGF   VRF_0041,GPR_0046,#412 3
Label L0034:
VL      VRF_0042,#413 8(GPR_0036,&GPR_0032),0
VMALF   VRF_0041,VRF_0041,VRF_0040,VRF_0042
LA      GPR_0036,#414 16(GPR_0036)
CLGRJ   GPR_0036,GPR_0043,Label L0034,MASK5(mask=0x4),
Label L0035:
LARL    GPR_0048, &<LiteralPool Base Address>     ; LoadLitPool
VL      VRF_0040,#411 =X(0) 0(GPR_0048),0
VMLF    VRF_0041,VRF_0041,VRF_0040
VGBM    VRF_0040,0x0
VSUMQF  VRF_0040,VRF_0041,VRF_0040
LA      GPR_0043,#415 12(GPR_0043)
VLGVF   GPR_0046,VRF_0040,#416 3
Label L0036:     # (End of internal control flow)
CLGRJ   GPR_0036,GPR_0043,Label L0038,MASK11(mask=0xa),
Label L0037:     # (Start of internal control flow)
SLLK    GPR_0039,GPR_0046,5
SGR     GPR_0039,GPR_0046
L       GPR_0046,#417 8(GPR_0036,&GPR_0032)
LA      GPR_0036,#418 4(GPR_0036)
AGR     GPR_0046,GPR_0039
CLGRJ   GPR_0036,GPR_0043,Label L0037,MASK5(mask=0x4),
assocreg
Label L0038:     # (End of internal control flow)

Byte array

------------------------------
 n9n      (  0)  treetop                                                                              [     0x3ff48f19910] bci=[-1,6,32] rc=0 vc=95 vn=- li=2 udi=- nc=1
 n8n      (  1)    icall  jdk/internal/util/ArraysSupport.vectorizedHashCode(Ljava/lang/Object;IIII)I[#408  static Method] [flags 0x500 0x0 ] (in GPR_0025) ()  [     0x3ff48f198c0] bci=[-1,6,32] rc=1 vc=95 vn=- li=2 udi=- nc=5 flg=
 n3n      (  0)      ==>aRegLoad (in &GPR_0016)
 n4n      (  0)      ==>iRegLoad (in GPR_0017) (cannotOverflow )
 n5n      (  0)      ==>iRegLoad (in GPR_0018) (cannotOverflow )
 n6n      (  0)      iload  <parm 3 I>[#402  Parm] [flags 0xc0000103 0x0 ] (in GPR_0025) (cannotOverflow )  [     0x3ff48f19820] bci=[-1,3,32] rc=0 vc=95 vn=- li=2 udi=- nc=0 flg=0x1000
 n7n      (  0)      iconst 8 (X!=0 X>=0 )                                                            [     0x3ff48f19870] bci=[-1,4,32] rc=0 vc=95 vn=- li=2 udi=- nc=0 flg=0x104
------------------------------

LGFR    GPR_0017,GPR_0017
LGFR    GPR_0018,GPR_0018
L       GPR_0025, Parm[<parm 3 I>] ?+24(GPR5)
Label L0016:     # (Start of internal control flow)
LA      GPR_0022,#412 0(GPR_0018,GPR_0017)
CLGIJ   GPR_0018,Label L0020,4,BM(mask=0x4),
Label L0017:
SLGFI   GPR_0022,3
LARL    GPR_0026, &<LiteralPool Base Address>     ; LoadLitPool
VL      VRF_0019,#413 =X(0) 0(GPR_0026),0
VGBM    VRF_0020,0x0
VLVGF   VRF_0020,GPR_0025,#414 3
Label L0018:
VLLEZF  VRF_0021,#415 8(GPR_0017,&GPR_0016)
VUPHB   VRF_0021,VRF_0021
VUPLHW  VRF_0021,VRF_0021
VMALF   VRF_0020,VRF_0020,VRF_0019,VRF_0021
LA      GPR_0017,#416 4(GPR_0017)
CLGRJ   GPR_0017,GPR_0022,Label L0018,MASK5(mask=0x4),
Label L0019:
LARL    GPR_0027, &<LiteralPool Base Address>     ; LoadLitPool
VL      VRF_0019,#413 =X(0) 0(GPR_0027),0
VMLF    VRF_0020,VRF_0020,VRF_0019
VGBM    VRF_0019,0x0
VSUMQF  VRF_0019,VRF_0020,VRF_0019
LA      GPR_0022,#417 3(GPR_0022)
VLGVF   GPR_0025,VRF_0019,#418 3
Label L0020:     # (End of internal control flow)
CLGRJ   GPR_0017,GPR_0022,Label L0022,MASK11(mask=0xa),
Label L0021:     # (Start of internal control flow)
SLLK    GPR_0018,GPR_0025,5
SGR     GPR_0018,GPR_0025
LB      GPR_0025,#419 8(GPR_0017,&GPR_0016)
LA      GPR_0017,#420 1(GPR_0017)
AGR     GPR_0025,GPR_0018
CLGRJ   GPR_0017,GPR_0022,Label L0021,MASK5(mask=0x4),
assocreg
Label L0022:     # (End of internal control flow)

@r30shah
Copy link
Contributor

r30shah commented May 24, 2024

jenkins test sanity zlinux jdk17,jdk21

@r30shah
Copy link
Contributor

r30shah commented May 24, 2024

@Spencer-Comin Given that this is accelerating the Arrays.hashcode implementation in JDK21 and onwards - I think it would be good idea to evaluate the performance through a benchmark (JMH based / bumblebench) to see how much speed up we got

@Spencer-Comin
Copy link
Contributor Author

Here's some benchmarking results for Arrays.hashcode on 10,000,000 element arrays:

Element Type Baseline (ops/s) Accelerated (ops/s) Acceleration
byte 121.008 416.066 343.8%
char 119.725 402.715 336.4%
int 125.050 406.709 325.2%
short 107.173 400.004 373.2%

We are getting a 3-4x improvement, which makes sense given that we go from consuming 1 element at a time in the scalar path to 4 at a time in the vector path.

@r30shah
Copy link
Contributor

r30shah commented May 29, 2024

jenkins test sanity zlinux jdk17

Copy link
Contributor

@r30shah r30shah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I see JDK17 failed with some infra related issue so launched the test again,

Is the Instruction selection have any dependency condition being generated ? If yes I do not see that in the snippet you pasted.

@Spencer-Comin
Copy link
Contributor Author

Yes, I forgot to include the dependency conditions from the log. Here's the integer array snippet again with the dependency conditions:

------------------------------
 n9n      (  0)  treetop                                                                              [     0x3ff48f19910] bci=[-1,6,40] rc=0 vc=106 vn=- li=2 udi=- nc=1
 n8n      (  1)    icall  jdk/internal/util/ArraysSupport.vectorizedHashCode(Ljava/lang/Object;IIII)I[#408  static Method] [flags 0x500 0x0 ] (in GPR_0046) ()  [     0x3ff48f198c0] bci=[-1,6,40] rc=1 vc=106 vn=- li=2 udi=- nc=5 flg=0x20
 n3n      (  0)      aload  <parm 0 Ljava/lang/Object;>[#399  Parm] [flags 0xc0000107 0x0 ] (in &GPR_0032)  [     0x3ff48f19730] bci=[-1,0,40] rc=0 vc=106 vn=- li=2 udi=- nc=0
 n4n      (  0)      iload  <parm 1 I>[#400  Parm] [flags 0xc0000103 0x0 ] (in GPR_0036) (cannotOverflow )  [     0x3ff48f19780] bci=[-1,1,40] rc=0 vc=106 vn=- li=2 udi=- nc=0 flg=0x1000
 n5n      (  0)      iload  <parm 2 I>[#401  Parm] [flags 0xc0000103 0x0 ] (in GPR_0039) (cannotOverflow )  [     0x3ff48f197d0] bci=[-1,2,40] rc=0 vc=106 vn=- li=2 udi=- nc=0 flg=0x1000
 n6n      (  0)      iload  <parm 3 I>[#402  Parm] [flags 0xc0000103 0x0 ] (in GPR_0046) (cannotOverflow )  [     0x3ff48f19820] bci=[-1,3,40] rc=0 vc=106 vn=- li=2 udi=- nc=0 flg=0x1000
 n7n      (  0)      iconst 10 (X!=0 X>=0 )                                                           [     0x3ff48f19870] bci=[-1,4,40] rc=0 vc=106 vn=- li=2 udi=- nc=0 flg=0x104
------------------------------

LG      &GPR_0032, Parm[<parm 0 Ljava/lang/Object;>] ?+0(GPR5)
L       GPR_0036, Parm[<parm 1 I>] ?+8(GPR5)
L       GPR_0039, Parm[<parm 2 I>] ?+16(GPR5)
LGFR    GPR_0036,GPR_0036
LGFR    GPR_0039,GPR_0039
L       GPR_0046, Parm[<parm 3 I>] ?+24(GPR5)
Label L0032:     # (Start of internal control flow)
SLLG    GPR_0036,GPR_0036,2
SLLG    GPR_0039,GPR_0039,2
LA      GPR_0043,#410 0(GPR_0039,GPR_0036)
CLGIJ   GPR_0039,Label L0036,16,BM(mask=0x4),
Label L0033:
SLGFI   GPR_0043,12
LARL    GPR_0047, &<LiteralPool Base Address>     ; LoadLitPool
VL      VRF_0040,#411 =X(0) 0(GPR_0047),0
VGBM    VRF_0041,0x0
VLVGF   VRF_0041,GPR_0046,#412 3
Label L0034:
VL      VRF_0042,#413 8(GPR_0036,&GPR_0032),0
VMALF   VRF_0041,VRF_0041,VRF_0040,VRF_0042
LA      GPR_0036,#414 16(GPR_0036)
CLGRJ   GPR_0036,GPR_0043,Label L0034,MASK5(mask=0x4),
Label L0035:
LARL    GPR_0048, &<LiteralPool Base Address>     ; LoadLitPool
VL      VRF_0040,#411 =X(0) 0(GPR_0048),0
VMLF    VRF_0041,VRF_0041,VRF_0040
VGBM    VRF_0040,0x0
VSUMQF  VRF_0040,VRF_0041,VRF_0040
LA      GPR_0043,#415 12(GPR_0043)
VLGVF   GPR_0046,VRF_0040,#416 3
Label L0036:     # (End of internal control flow)
CLGRJ   GPR_0036,GPR_0043,Label L0038,MASK11(mask=0xa),
Label L0037:     # (Start of internal control flow)
SLLK    GPR_0039,GPR_0046,5
SGR     GPR_0039,GPR_0046
L       GPR_0046,#417 8(GPR_0036,&GPR_0032)
LA      GPR_0036,#418 4(GPR_0036)
AGR     GPR_0046,GPR_0039
CLGRJ   GPR_0036,GPR_0043,Label L0037,MASK5(mask=0x4),
assocreg
Label L0038:     # (End of internal control flow)
 POST:
 {AssignAny:GPR_0047:R} {AssignAny:GPR_0048:R} {AssignAny:&GPR_0032:R} {AssignAny:GPR_0036:R} {AssignAny:GPR_0039:R} {AssignAny:GPR_0046:R} {AssignAny:GPR_0043:R} {AssignAny:VRF_0040:R}
 {AssignAny:VRF_0041:R} {AssignAny:VRF_0042:R}

@r30shah
Copy link
Contributor

r30shah commented May 30, 2024

Hi @Spencer-Comin All the tests have been passed, so going to merge this one. One odd think I am seeing with the generated instructions is the Load and LGFR instruction.
I do not think so this is what your code is going (Can you confirm) so is there some missing opportunity here?

@r30shah r30shah merged commit fce37e9 into eclipse-openj9:master May 30, 2024
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants