fix: schema isn't expected for IVF_PQ #3606

BubbleCal · 2025-03-26T05:17:34Z

now we drop the __ivf_part_id when shuffling, the corner is that num_partitions=1:

if num_partitions=1 then no shuffling is needed
the shuffler reader would return the data directly
then the __ivf_part_id is not dropped, it's written into the index file as well

Signed-off-by: BubbleCal <[email protected]>

codecov-commenter · 2025-03-26T07:45:34Z

Codecov Report

Attention: Patch coverage is 92.00000% with 6 lines in your changes missing coverage. Please review.

Project coverage is 78.68%. Comparing base (20cde3b) to head (7d3556e).

Files with missing lines	Patch %	Lines
rust/lance-index/src/vector/pq/storage.rs	95.16%	1 Missing and 2 partials ⚠️
rust/lance/src/index/vector/ivf/v2.rs	33.33%	2 Missing ⚠️
rust/lance-index/src/vector/storage.rs	90.00%	0 Missing and 1 partial ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #3606      +/-   ##
==========================================
+ Coverage   78.67%   78.68%   +0.01%     
==========================================
  Files         258      258              
  Lines       96817    96890      +73     
  Branches    96817    96890      +73     
==========================================
+ Hits        76172    76242      +70     
+ Misses      17578    17576       -2     
- Partials     3067     3072       +5

Flag	Coverage Δ
unittests	`78.68% <92.00%> (+0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

westonpace

Good find. A few small questions because I'm not sure why we need extra code to generate an invalid state?

westonpace · 2025-03-26T11:35:49Z

rust/lance-index/src/vector/pq/storage.rs

@@ -1005,6 +1017,40 @@ mod tests {
            .unwrap()
    }

+    async fn create_pq_storage_with_extra_column() -> ProductQuantizationStorage {


Can we still get PQ storage with an extra column from a real workflow? Or is this just generating some kind of invalid input for testing?

it's just for testing, we shouldn't see any extra column in real workflow

westonpace · 2025-03-26T11:36:28Z

rust/lance-index/src/vector/pq/storage.rs

@@ -1062,4 +1108,25 @@ mod tests {
        let dist2 = storage.dist_between(v, u);
        assert_eq!(dist1, dist2);
    }
+
+    #[tokio::test]
+    async fn test_remap_with_extra_column() {


Is this because some old indices will have this extra column and we need to make sure they are supported?

right, we saw some feedbacks about this, so add this test to make sure the old indices could work with this fix

now we drop the `__ivf_part_id` when shuffling, the corner is that `num_partitions=1`: 1. if `num_partitions=1` then no shuffling is needed 2. the shuffler reader would return the data directly 3. then the `__ivf_part_id` is not dropped, it's written into the index file as well --------- Signed-off-by: BubbleCal <[email protected]>

fix: schema isn't expected for IVF_PQ

2d96e3b

Signed-off-by: BubbleCal <[email protected]>

github-actions bot added the bug Something isn't working label Mar 26, 2025

BubbleCal added 4 commits March 26, 2025 13:35

add test

f0808b2

Signed-off-by: BubbleCal <[email protected]>

fix

045f661

Signed-off-by: BubbleCal <[email protected]>

fmt

b7a27ed

Signed-off-by: BubbleCal <[email protected]>

fix

52a9d7e

Signed-off-by: BubbleCal <[email protected]>

BubbleCal requested review from westonpace and wkalt March 26, 2025 06:51

fix

7d3556e

Signed-off-by: BubbleCal <[email protected]>

BubbleCal marked this pull request as ready for review March 26, 2025 07:45

westonpace reviewed Mar 26, 2025

View reviewed changes

BubbleCal requested a review from westonpace March 26, 2025 14:11

wkalt approved these changes Mar 26, 2025

View reviewed changes

BubbleCal merged commit 33634d3 into lancedb:main Mar 26, 2025
30 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: schema isn't expected for IVF_PQ #3606

fix: schema isn't expected for IVF_PQ #3606

BubbleCal commented Mar 26, 2025 •

edited

Loading

codecov-commenter commented Mar 26, 2025

westonpace left a comment

westonpace Mar 26, 2025

BubbleCal Mar 26, 2025

westonpace Mar 26, 2025

BubbleCal Mar 26, 2025

fix: schema isn't expected for IVF_PQ #3606

fix: schema isn't expected for IVF_PQ #3606

Conversation

BubbleCal commented Mar 26, 2025 • edited Loading

codecov-commenter commented Mar 26, 2025

Codecov Report

westonpace left a comment

Choose a reason for hiding this comment

westonpace Mar 26, 2025

Choose a reason for hiding this comment

BubbleCal Mar 26, 2025

Choose a reason for hiding this comment

westonpace Mar 26, 2025

Choose a reason for hiding this comment

BubbleCal Mar 26, 2025

Choose a reason for hiding this comment

BubbleCal commented Mar 26, 2025 •

edited

Loading