Skip to content

[RFC] Dynamic Quantization In Vector Indices #205

@sam-herman

Description

@sam-herman

[RFC] Dynamic Quantization for In-Place Index Updates

Meta

Table of Contents

Problem Statement

As we work on in-place updating indices for jVector (see #169), one major challenge is how to handle quantization efficiently.

Background on Quantization:
Quantization is a process of converting vectors into compressed representations to save memory and improve performance. In jVector, we support various quantization methods including Product Quantization (PQ). However, quantizers need to be trained on representative data, and as new vectors are added to the index, the quantizer may become less optimal for the evolving dataset.

The Core Challenge:
Currently, adding new vectors requires rebuilding the entire index from scratch to retrain quantizers. This is inefficient and defeats the purpose of in-place updates. We need a mechanism to:

  1. Continuously adapt quantizers as new data arrives
  2. Detect when an adapted quantizer provides better compression than the current one
  3. Switch to better quantizers without full index reconstruction
  4. Measure quantization quality in a way that generalizes across different quantization methods

Motivation

The goal is to enable efficient in-place index updates while maintaining search quality and performance. By implementing adaptive quantization with reconstruction error tracking, we can:

  1. Eliminate unnecessary rebuilds: Only switch quantizers when reconstruction error demonstrably improves
  2. Improve update performance: Allow incremental updates without full index reconstruction
  3. Maintain search quality: Ensure that quantization quality improves or remains stable over time
  4. Support multiple quantization methods: Use reconstruction error as a universal quality metric
  5. Enable parameter optimization: Adjust quantizer parameters (e.g., number of sub-dimensions) based on reconstruction error
  6. Optimize resource usage: Minimize computational overhead while maximizing compression quality

This feature is critical for production deployments where indices need to be updated frequently while maintaining acceptable recall rates and an acceptable cost.

Scope

This RFC covers the following:

  • Multi-quantizer management: Maintain multiple quantizers simultaneously (active and candidate)
  • Online quantizer adaptation: Continuously refine candidate quantizers as vectors are added or removed
  • Reconstruction error tracking: Monitor reconstruction error for both active and candidate quantizers
  • Automatic quantizer switching: Switch to candidate quantizer when reconstruction error improves
  • Parameter optimization: Adjust quantizer parameters (e.g., number of sub-dimensions) based on reconstruction error
  • Metadata management: Store quantizer versions, reconstruction error metrics, and parameters in index metadata
  • Generalized quantization support: Design that works with Product Quantization, Scalar Quantization, and other methods

Out of Scope

The following are explicitly out of scope for this RFC:

  • Distributed quantizer computation across multiple nodes
  • User-configurable reconstruction error thresholds (will use sensible defaults initially)
  • Automatic rollback mechanisms if quantizer switch fails
  • More than two simultaneous quantizers (future enhancement)
  • Performance optimizations for quantizer training itself
  • Quantizer compression/decompression algorithms (handled by jVector library)

Proposed Design

Overview

We will build on the existing online PQ refinement work (#137) and leverage jVector's reconstruction error tracking capabilities (jVector #565) to implement an adaptive multi-quantizer system.

The design consists of four main components:

  1. Multi-Quantizer Architecture: Maintain two quantizers simultaneously - an active quantizer used for graph construction and a candidate quantizer that is continuously refined
  2. Reconstruction Error Tracking: Monitor reconstruction error for both quantizers to measure compression quality
  3. Automatic Switching: Replace the active quantizer with the candidate when reconstruction error improves
  4. Parameter Optimization: Adjust quantizer parameters (e.g., number of sub-dimensions) based on reconstruction error trends

Key Insight: Instead of using cost tracking techniques specific to PQ such as centroid drift as a proxy for quality degradation, we directly measure reconstruction error - the difference between original and quantized vectors. This metric:

  • Works across all quantization methods (PQ, scalar quantization, etc.)
  • Directly correlates with search quality
  • Can be efficiently computed within the jVector quantizer
  • Enables data-driven parameter optimization

Multi-Quantizer Architecture

A key component in this architecture is the multi-quantizer approach.
In the most simple case, the system maintains two quantizers simultaneously:

Active Quantizer

  • Purpose: Currently used for graph construction and search
  • Stability: Remains unchanged until a better candidate is available
  • Usage: All new vectors added to the graph are quantized using this quantizer
  • Persistence: Stored in index metadata and loaded on index open

Candidate Quantizer

  • Purpose: Continuously refined as new vectors are added
  • Adaptation: Updated using online learning techniques from #137
  • Evaluation: Reconstruction error is tracked to assess quality
  • Promotion: Becomes the active quantizer when reconstruction error is demonstrably better

Metadata Structure:

{
  "quantization_metadata": {
    "active_quantizer": {
      "version": 1,
      "type": "product_quantization",
      "parameters": {
        "num_subquantizers": 8,
        "bits_per_subquantizer": 8,
        "distance_metric": "euclidean"
      },
      "data": "...",
      "reconstruction_error": {
        "mean": 0.15,
        "std_dev": 0.08,
        "max": 0.42,
        "percentile_95": 0.28,
        "sample_size": 10000
      },
      "created_timestamp": "2024-11-11T10:00:00Z",
      "vectors_quantized": 50000
    },
    "candidate_quantizer": {
      "version": 2,
      "type": "product_quantization",
      "parameters": {
        "num_subquantizers": 8,
        "bits_per_subquantizer": 8,
        "distance_metric": "euclidean"
      },
      "data": "...",
      "reconstruction_error": {
        "mean": 0.12,
        "std_dev": 0.06,
        "max": 0.35,
        "percentile_95": 0.22,
        "sample_size": 10000
      },
      "created_timestamp": "2024-11-11T12:00:00Z",
      "vectors_adapted": 15000
    },
    "last_switch_timestamp": "2024-11-11T10:00:00Z",
    "switch_count": 3
  }
}

Version Management:

  • Each quantizer has a monotonically increasing version number
  • When candidate becomes active, its version is preserved
  • A new candidate is created with the next version number

Reconstruction Error Tracking

Reconstruction error measures the quality of quantization by comparing original vectors with their quantized representations. This is a direct measure of information loss during quantization.

What is Reconstruction Error?

For a vector v and its quantized representation q(v):

reconstruction_error(v) = distance(v, dequantize(q(v)))

Where:

  • q(v) is the quantized representation
  • dequantize(q(v)) reconstructs an approximation of the original vector
  • distance() uses the same metric as the vector space (Euclidean, cosine, etc.)

Reconstruction Error Metrics:

We track the following statistics to characterize quantization quality:

Metric Description Purpose
Mean Error Average reconstruction error across sampled vectors Primary indicator of overall quantization quality
Std Dev Error Standard deviation of reconstruction errors Indicates consistency of quantization quality
Max Error Maximum reconstruction error observed Identifies worst-case quantization scenarios
95th Percentile 95th percentile of reconstruction errors Robust measure excluding outliers
Sample Size Number of vectors used for error calculation Indicates statistical confidence

Error Tracking Implementation:

Leverage jVector's built-in reconstruction error tracking (jVector #565):

// Within the jVector quantizer
public class AdaptiveQuantizer {
    private ReconstructionErrorTracker errorTracker;

    public void trackReconstructionError(float[] original, byte[] quantized) {
        float[] reconstructed = dequantize(quantized);
        float error = distance(original, reconstructed);
        errorTracker.addSample(error);
    }

    public ReconstructionErrorStats getErrorStats() {
        return errorTracker.computeStats();
    }
}

Sampling Strategy:

  • Track reconstruction error for a representative sample of vectors (e.g., 10,000 vectors)
  • Use reservoir sampling to maintain a fixed-size sample as index grows
  • Periodically refresh samples to capture recent data distribution
  • Compute statistics incrementally to minimize overhead

Quantizer Switching Mechanism

The system automatically switches from the active quantizer to the candidate quantizer when reconstruction error demonstrates improvement.

Switching Criteria:

A quantizer switch will be triggered when all of the following conditions are met:

  1. Sufficient sample size: candidate.sample_size >= MIN_SAMPLE_SIZE (e.g., 10,000 vectors)
  2. Mean error improvement: candidate.mean_error < active.mean_error * (1 - MIN_IMPROVEMENT_RATIO) (e.g., 5% improvement)
  3. Statistical significance: Improvement is statistically significant (e.g., using t-test with p < 0.05)
  4. Stability check: Candidate error has been stable for a minimum period (e.g., 1000 vector additions)

Example:

Active quantizer:  mean_error = 0.15, std_dev = 0.08, sample_size = 50000
Candidate quantizer: mean_error = 0.12, std_dev = 0.06, sample_size = 15000

Improvement = (0.15 - 0.12) / 0.15 = 20% > 5% threshold ✓
Sample size = 15000 > 10000 ✓
Statistical significance: t-test p-value = 0.001 < 0.05 ✓
Stability: error stable for last 2000 vectors ✓

→ Trigger quantizer switch

Switching Process:

  1. Validation: Verify switching criteria are met
  2. Logging: Log switch event with reconstruction error metrics for both quantizers
  3. Promotion: Promote candidate to active quantizer
  4. Remediation Strategy Selection: Choose appropriate remediation approach (see below)
  5. New Candidate: Initialize a new candidate quantizer (copy of new active quantizer)
  6. Metadata Update: Update index metadata with new quantizer versions and timestamps
  7. Monitoring: Track switch count and frequency for observability

Remediation Strategies:

When switching to a new quantizer, there are two strategies for updating the graph, with different cost/benefit tradeoffs:

Strategy Description Cost Benefit Use Case
Full Reconstruction Re-quantize all vectors and rebuild entire graph High (O(n) re-quantization + graph rebuild) Immediate optimal quality Large reconstruction error improvement (>15%)
Lazy Re-quantization Switch quantizer but keep existing quantized vectors, allow graph pruning to gradually correct Low (O(1) quantizer swap) Eventual convergence to optimal quality Small to moderate reconstruction error improvement (<15%)

Lazy Re-quantization Details:

The lazy approach leverages the graph's self-correcting properties:

  1. Immediate Switch: New vectors are quantized using the new quantizer
  2. Mixed State: Graph temporarily contains vectors quantized with different quantizers
  3. Gradual Correction: Graph pruning cycles progressively improve edge quality
    • Pruning removes suboptimal edges based on current distance calculations
    • New edges are added using the new quantizer
    • Over multiple pruning cycles, graph converges to optimal structure
  4. Convergence: After sufficient pruning cycles, graph quality matches full reconstruction

Advantages of Lazy Approach:

  • No downtime or performance impact during switch
  • Minimal computational cost
  • Suitable for continuous adaptation scenarios
  • Works well when reconstruction error improvement is modest

Disadvantages of Lazy Approach:

  • Temporary quality degradation during convergence period
  • Convergence time depends on pruning frequency and graph dynamics
  • May not be suitable for large reconstruction error improvements

Strategy Selection Logic:

if (reconstruction_error_improvement > 15%) {
    // Large improvement justifies full reconstruction
    return FULL_RECONSTRUCTION;
} else {
    // Small to moderate improvement: let pruning handle it
    return LAZY_REQUANTIZATION;
}

Configuration:

  • Strategy can be configured per index based on workload characteristics
  • Threshold for strategy selection (default 15%) can be tuned based on empirical results
  • Future work may support user-configurable strategy preferences

Avoiding Thrashing:

To prevent frequent switching:

  • Require minimum improvement ratio (e.g., 5%)
  • Enforce minimum time between switches (e.g., 1 hour)
  • Require stability period before switching
  • Track switch frequency and alert if excessive

Parameter Optimization

With reconstruction error as a quality metric, we can optimize quantizer parameters to improve compression quality.

Optimizable Parameters:

Parameter Description Impact on Reconstruction Error
Number of Sub-dimensions (PQ) How many subquantizers to use More subquantizers → lower error, higher memory
Bits per Sub-dimension (PQ) Codebook size per subquantizer More bits → lower error, higher memory
Quantization Type PQ, Scalar, Binary, etc. Different methods have different error/memory tradeoffs
Distance Metric Euclidean, Cosine, etc. Affects reconstruction error calculation

Optimization Strategy:

  1. Baseline Establishment: Start with default parameters
  2. Candidate Generation: Create candidate quantizers with different parameters
  3. Parallel Evaluation: Track reconstruction error for multiple candidates simultaneously
  4. Best Selection: Promote candidate with lowest reconstruction error (if improvement threshold met)
  5. Iterative Refinement: Continue exploring parameter space over time

Example Scenario:

Initial configuration: 8 subquantizers, 8 bits each
Active quantizer: mean_error = 0.15

Create candidates:
- Candidate A: 16 subquantizers, 8 bits → mean_error = 0.10 (33% improvement)
- Candidate B: 8 subquantizers, 16 bits → mean_error = 0.12 (20% improvement)
- Candidate C: 12 subquantizers, 8 bits → mean_error = 0.11 (27% improvement)

Select Candidate A (best error reduction)
→ Switch to 16 subquantizers configuration

Constraints:

  • Memory budget: Total quantizer memory must not exceed configured limit
  • Computational budget: Training time must be acceptable
  • Search performance: Quantizer must support efficient distance computation

Future Enhancement:

  • Automated parameter search using Bayesian optimization
  • Multi-objective optimization (error vs. memory vs. speed)
  • A/B testing framework for parameter changes

Implementation Details

Index Metadata Changes

Add new fields to the jVector index metadata to store:

  • Active and candidate quantizer configurations
  • Reconstruction error statistics for both quantizers
  • Quantizer parameters and versions
  • Switch history and timestamps

Online Adaptation Integration

Leverage existing online PQ refinement from #137:

  • Continue to adapt candidate quantizer as vectors are added/removed
  • Track reconstruction error during adaptation using jVector's built-in tracking
  • Periodically evaluate switching criteria
  • Maintain active quantizer unchanged until switch is triggered

Performance Note: Cost over time of PQ refinement remains proportional to the number of newly added vectors:

Image

Reconstruction Error Calculation

Implement reconstruction error tracking within jVector quantizer (#565):

  • Calculate error for sampled vectors during quantization
  • Use reservoir sampling to maintain representative sample
  • Compute statistics incrementally to minimize overhead
  • Store results in quantizer metadata

Quantizer Switch Coordination

Implement switching logic:

  • Check switching criteria after each adaptation cycle
  • Coordinate switch with index update operations
  • Ensure thread-safety during re-quantization process
  • Maintain consistency between quantizer metadata and quantized vectors

Performance Considerations

Computational Overhead

  • Reconstruction error tracking: O(1) per vector quantization (incremental statistics update)
  • Quantizer adaptation: Proportional to number of newly added vectors (see graph above)
  • Metadata storage: Minimal overhead (few KB per index for both quantizers)
  • Switch frequency: Expected to be infrequent (e.g., every 10K-100K vector additions)
  • Remediation cost: Varies by strategy:
    • Lazy: O(1) - just swap quantizer reference
    • Full reconstruction: O(n log n) - re-quantize + rebuild graph

Memory Impact

  • Two quantizers: ~2× quantizer size (typically < 1MB for PQ with reasonable parameters)
  • Reconstruction error samples: ~10K vectors × sizeof(float) ≈ 40KB per quantizer
  • Metadata: Negligible (< 1KB for statistics and configuration)
  • Total additional memory: < 1% of index size for typical configurations
  • During lazy re-quantization: No additional memory (mixed quantizer state)

Search Performance

  • No impact during normal operation: Reconstruction error tracking happens during indexing
  • During lazy re-quantization: Temporary quality degradation until pruning converges
    • Magnitude depends on reconstruction error improvement
    • Duration depends on pruning frequency
    • Typically converges within a few pruning cycles
  • During full reconstruction: Temporary latency increase but immediate quality improvement
    • Will happen only on merges or periodic refreshes if index is segmentless
  • Long-term: Improved search quality due to better quantization

Testing Strategy

Unit Tests

  • Test multi-quantizer management (active/candidate lifecycle)
  • Test reconstruction error calculation and statistics
  • Test quantizer switching criteria evaluation
  • Test metadata serialization/deserialization for both quantizers
  • Test parameter optimization logic
  • Test remediation strategy selection

Integration Tests

  • Test online adaptation with reconstruction error tracking
  • Test quantizer switching process (both lazy and full reconstruction strategies)
  • Test concurrent updates during quantizer switch
  • Test metadata persistence across index reopens
  • Test mixed quantizer state during lazy re-quantization
  • Test graph convergence after lazy re-quantization

Performance Tests

  • Measure reconstruction error tracking overhead
  • Measure quantizer switch frequency under various workloads
  • Measure search quality during lazy re-quantization convergence
  • Benchmark memory overhead of dual quantizers
  • Compare remediation strategy costs (lazy vs. full reconstruction)
  • Measure pruning cycle convergence time

Validation Tests

  • Verify reconstruction error correlates with search quality (recall@k)
  • Validate switching thresholds through empirical testing
  • Test edge cases (empty index, single vector, etc.)
  • Verify lazy re-quantization converges to same quality as full reconstruction
  • Test parameter optimization improves reconstruction error
  • Validate statistical significance testing for switching decisions

Open Questions

  1. Switching Threshold Tuning: What are the optimal default thresholds for reconstruction error improvement to trigger a switch?
  2. Remediation Strategy Selection: What are the optimal thresholds for selecting between lazy, hybrid, and full reconstruction strategies?
  3. Convergence Monitoring: How do we measure and monitor convergence during lazy re-quantization?
  4. Pruning Frequency: What is the optimal pruning frequency to ensure timely convergence during lazy re-quantization?
  5. Parameter Search Space: For parameter optimization, what is the appropriate search space and exploration strategy?
  6. Multi-Candidate Support: Should we support more than two quantizers simultaneously for parallel parameter exploration?
  7. User Control: Should users be able to:
    • Manually trigger quantizer switches?
    • Configure switching thresholds?
    • Select remediation strategy?
    • Disable adaptive quantization?
  8. Monitoring: What metrics should be exposed for observability?
    • Reconstruction error trends over time
    • Quantizer switch frequency and reasons
    • Convergence progress during lazy re-quantization
    • Memory usage of dual quantizers
  9. Rollback: Should we support rollback to previous quantizer if switch degrades quality?
  10. Cross-Quantization-Type Switching: Can we switch between different quantization types (e.g., PQ to scalar quantization) or only within the same type?

References

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

Status

New

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions