Skip to content

Enhance ADC scoring with SIMD Vector API and add comprehensive tests#3167

Open
ajw711 wants to merge 3 commits intoopensearch-project:mainfrom
ajw711:perf/simd-adc-vector-api
Open

Enhance ADC scoring with SIMD Vector API and add comprehensive tests#3167
ajw711 wants to merge 3 commits intoopensearch-project:mainfrom
ajw711:perf/simd-adc-vector-api

Conversation

@ajw711
Copy link

@ajw711 ajw711 commented Mar 14, 2026

Description

Resolves #3150

Implements SIMD optimization for l2SquaredADC and innerProductADC using Java Vector API (FloatVector.SPECIES_PREFERRED) as suggested in the linked issue.

Key Improvements & Stability Considerations:

  • Robust Build Configuration: Configured --add-modules jdk.incubator.vector and --enable-preview globally in build.gradle using tasks.withType. This ensures the Vector API is consistently injected not only during JavaCompile but also during Test and JavaExec phases, preventing unexpected NoClassDefFoundError at runtime.
  • Comprehensive Testing: Added detailed unit tests covering bit-level operations, multi-byte arrays, and specifically a large 1024-dimension edge case. This guarantees both the SIMD loop bounds and the tail-loop logic function perfectly under heavy operations.
  • Documentation: Updated CHANGELOG.md to reflect the enhancement.

Related Issues

Resolves #3150

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@ajw711 ajw711 force-pushed the perf/simd-adc-vector-api branch from d6a2fab to 01a31cc Compare March 14, 2026 03:14
@ajw711
Copy link
Author

ajw711 commented Mar 14, 2026

Hi @navneet1v and team,

I've implemented the SIMD optimization for l2SquaredADC and innerProductADC in this PR.

Key highlights compared to the existing implementation:

  • Common helper: unpackBitsToFloatVector eliminates code duplication between both methods
  • Bit extraction strategy: Packs multiple bytes into a single int and broadcasts across all SIMD lanes via IntVector.broadcast + pre-computed SHIFT_VECTOR, avoiding per-element scalar operations
  • FMA optimization: Uses fma (fused multiply-add) for single-instruction accumulation in both methods
  • All SIMD widths supported: lanes=8 (AVX2), lanes=16 (AVX-512), lanes=32 without branching
  • Comprehensive tests: Covers bit-level correctness, multi-byte arrays, tail loop, and 1024-dimension vectors

Happy to receive any feedback!

@0ctopus13prime
Copy link
Collaborator

Thank you @ajw711
Taking it a look and get back to you!

@finnroblin
Copy link
Contributor

Thank you for the PR @ajw711 ! The logic looks correct.

3 comments:

  1. for the tests can we please add a more challenging unit test in addition to (all 0s, all 1s)? I am thinking we could add a random test where a bit vector is generated randomly and the SIMD and non-SIMD ADC is tested. This will verify more edge cases.
  2. We should fallback to non-SIMD mechanisms if the SIMD module is not available.
  3. Instead of using --enable-preview flags we should use reflection similar to the MemorySegmentAddressExtractorUtil which handles the different cases where MemorySegment is not available, incubating, and part of the JDK core. More details below:

Since OpenSearch 3.x supports jdk21, can we use reflection to access the preview SIMD methods instead of --enable-preview? I tried compiling with JDK25 and am getting an error related to --enable-preview flag:

(base) finnrobl@80a997329e07 k-NN % ./gradlew compileJava

> Configure project :
Java home directory used by gradle: /Library/Java/JavaVirtualMachines/amazon-corretto-25.jdk/Contents/Home
CMake command: cmake -S jni -B jni/build -DKNN_PLUGIN_VERSION=3.6.0-SNAPSHOT -DCOMMIT_LIB_PATCHES=true -DAPPLY_LIB_PATCHES=true -DCMAKE_POLICY_VERSION_MINIMUM=3.5
Build command: cmake --build jni/build --target opensearchknn_faiss opensearchknn_common opensearchknn_nmslib opensearchknn_simd --parallel 1
=======================================
OpenSearch Build Hamster says Hello!
  Gradle Version        : 9.2.0
  OS Info               : Mac OS X 15.7.2 (aarch64)
  JDK Version           : 25 (Amazon Corretto JDK 25 (25.0.1+8-LTS))
  JAVA_HOME             : /Library/Java/JavaVirtualMachines/amazon-corretto-25.jdk/Contents/Home
  Random Testing Seed   : E1BA3A923AD4BB54
  Crypto Standard       : any-supported
=======================================

> Task :compileJava FAILED
error: invalid source release 21 with --enable-preview
  (preview language features are only supported for release 25)
1 error

[Incubating] Problems report is available at: file:///Users/finnrobl/Documents/k-NN-2/k-NN/build/reports/problems/problems-report.html

FAILURE: Build failed with an exception.

* What went wrong:
Execution failed for task ':compileJava'.
> Compilation failed; see the compiler output below.
  error: invalid source release 21 with --enable-preview
    (preview language features are only supported for release 25)
  1 error

There's an example in some code from @0ctopus13prime that uses the MemorySegment feature that's preview in JDK21 but part of main since JDK22. Refs:

.
public final class MemorySegmentAddressExtractorJDK21 extends AbstractMemorySegmentAddressExtractor {

navneet1v and others added 2 commits March 16, 2026 21:52
Signed-off-by: Navneet Verma <navneev@amazon.com>
… SIMD reflection

Signed-off-by: An Jinwon <ajw711@naver.com>
@ajw711
Copy link
Author

ajw711 commented Mar 17, 2026

Thanks for the review.

To fix the JDK 25 build error, I removed the --enable-preview flag from build.gradle and only kept the required module (--add-modules jdk.incubator.vector).

I also added a reflection fallback in KNNScoringUtil. The SIMD class remains in the build, but if the Vector API is unavailable at runtime, it safely falls back to the scalar logic.

Please let me know if this approach looks okay or if further changes are needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Enhancement] Integrate SIMD with ADC distance computation

4 participants