[Enhancement] Integrate SIMD with ADC distance computation · Issue #3150 · opensearch-project/k-NN

Description

While reading through the code of KNN what I found was ADC feature is not using SIMD instruction set for doing distance computation. Creating this gh issue to track the SIMD implementation for ADC. I just see bunch of TODOs in the code

Code Ref which needs a change:

k-NN/src/main/java/org/opensearch/knn/plugin/script/KNNScoringUtil.java

Lines 114 to 175 in 0d05a10

    
               /** 
        
                * Calculates the L2 squared distance between a float query vector and a binary document vector using ADC (Asymmetric Distance Computation). 
        
                * This method implements a specialized version of L2 distance calculation where one vector is in binary format (compressed) 
        
                * and the other is in float format (uncompressed). 
        
                * 
        
                * TODO: For now this will be very inefficient. We can in the future optimize through the following: 
        
                * Use FloatVector.SPECIES_PREFERRED for SIMD processing in chunks with reduceLanes(), 
        
                * Configure build.gradle with --add-modules jdk.incubator.vector --enable-preview flags into the VectorUtils class. 
        
                * 
        
                * @param queryVector The uncompressed query vector in float format 
        
                * @param inputVector The compressed document vector in binary format, where each bit represents a dimension 
        
                * @return The L2 squared distance between the two vectors. Lower values indicate closer vectors. 
        
                * @throws IllegalArgumentException if queryVector length is not compatible with inputVector length (queryVector.length != inputVector.length * 8) 
        
                */ 
        
               public static float l2SquaredADC(float[] queryVector, byte[] inputVector) { 
        
                   // we cannot defer to VectorUtil as it does not support ADC. 
        
                   // TODO: add batching logic similar to C++ to improve performance. 
        
                   float score = 0; 
        
                   for (int i = 0; i < queryVector.length; ++i) { 
        
                       int byteIndex = i / 8; 
        
                       int bitOffset = 7 - (i % 8); 
        
                       int bitValue = (inputVector[byteIndex] >> bitOffset) & 1; 
        
                       // Calculate squared difference 
        
                       float diff = bitValue - queryVector[i]; 
        
                       score += diff * diff; 
        
                   } 
        
                   return score; 
        
               } 
        
               /** 
        
                * Calculates the inner product similarity between a float query vector and a binary document vector using ADC 
        
                * (Asymmetric Distance Computation). This method is useful for similarity searches where one vector is compressed 
        
                * in binary format and the other remains in float format. 
        
                * 
        
                * The inner product is calculated by summing the products of corresponding elements, where the binary vector's 
        
                * elements are interpreted as 0 or 1. 
        
                * 
        
                * TODO: For now this will be very inefficient. We can in the future optimize through the following: 
        
                * Use FloatVector.SPECIES_PREFERRED for SIMD processing in chunks with reduceLanes(), 
        
                * Configure build.gradle with --add-modules jdk.incubator.vector --enable-preview flags into the VectorUtils class. 
        
                * 
        
                * @param queryVector The uncompressed query vector in float format 
        
                * @param inputVector The compressed document vector in binary format, where each bit represents a dimension 
        
                * @return The inner product similarity score between the two vectors. Higher values indicate more similar vectors. 
        
                * @throws IllegalArgumentException if queryVector length is not compatible with inputVector length (queryVector.length != inputVector.length * 8) 
        
                */ 
        
               public static float innerProductADC(float[] queryVector, byte[] inputVector) { 
        
                   float score = 0; 
        
                   for (int i = 0; i < queryVector.length; ++i) { 
        
                       // Extract the bit for this dimension 
        
                       int byteIndex = i / 8; 
        
                       int bitOffset = 7 - (i % 8); 
        
                       int bitValue = (inputVector[byteIndex] >> bitOffset) & 1; 
        
                       // Calculate product and accumulate 
        
                       score += bitValue * queryVector[i]; 
        
                   } 
        
                   return score; 
        
               }

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Enhancement] Integrate SIMD with ADC distance computation #3150

Description

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

	/**
	* Calculates the L2 squared distance between a float query vector and a binary document vector using ADC (Asymmetric Distance Computation).
	* This method implements a specialized version of L2 distance calculation where one vector is in binary format (compressed)
	* and the other is in float format (uncompressed).
	*
	* TODO: For now this will be very inefficient. We can in the future optimize through the following:
	* Use FloatVector.SPECIES_PREFERRED for SIMD processing in chunks with reduceLanes(),
	* Configure build.gradle with --add-modules jdk.incubator.vector --enable-preview flags into the VectorUtils class.
	*
	* @param queryVector The uncompressed query vector in float format
	* @param inputVector The compressed document vector in binary format, where each bit represents a dimension
	* @return The L2 squared distance between the two vectors. Lower values indicate closer vectors.
	* @throws IllegalArgumentException if queryVector length is not compatible with inputVector length (queryVector.length != inputVector.length * 8)
	*/
	public static float l2SquaredADC(float[] queryVector, byte[] inputVector) {
	// we cannot defer to VectorUtil as it does not support ADC.
	// TODO: add batching logic similar to C++ to improve performance.
	float score = 0;

	for (int i = 0; i < queryVector.length; ++i) {
	int byteIndex = i / 8;
	int bitOffset = 7 - (i % 8);
	int bitValue = (inputVector[byteIndex] >> bitOffset) & 1;

	// Calculate squared difference
	float diff = bitValue - queryVector[i];
	score += diff * diff;
	}
	return score;
	}

	/**
	* Calculates the inner product similarity between a float query vector and a binary document vector using ADC
	* (Asymmetric Distance Computation). This method is useful for similarity searches where one vector is compressed
	* in binary format and the other remains in float format.
	*
	* The inner product is calculated by summing the products of corresponding elements, where the binary vector's
	* elements are interpreted as 0 or 1.
	*
	* TODO: For now this will be very inefficient. We can in the future optimize through the following:
	* Use FloatVector.SPECIES_PREFERRED for SIMD processing in chunks with reduceLanes(),
	* Configure build.gradle with --add-modules jdk.incubator.vector --enable-preview flags into the VectorUtils class.
	*
	* @param queryVector The uncompressed query vector in float format
	* @param inputVector The compressed document vector in binary format, where each bit represents a dimension
	* @return The inner product similarity score between the two vectors. Higher values indicate more similar vectors.
	* @throws IllegalArgumentException if queryVector length is not compatible with inputVector length (queryVector.length != inputVector.length * 8)
	*/
	public static float innerProductADC(float[] queryVector, byte[] inputVector) {
	float score = 0;

	for (int i = 0; i < queryVector.length; ++i) {
	// Extract the bit for this dimension
	int byteIndex = i / 8;
	int bitOffset = 7 - (i % 8);
	int bitValue = (inputVector[byteIndex] >> bitOffset) & 1;

	// Calculate product and accumulate
	score += bitValue * queryVector[i];
	}
	return score;
	}

[Enhancement] Integrate SIMD with ADC distance computation #3150

Description

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions