Skip to content

fix: schema isn't expected for IVF_PQ #3606

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Mar 26, 2025

Conversation

BubbleCal
Copy link
Contributor

@BubbleCal BubbleCal commented Mar 26, 2025

now we drop the __ivf_part_id when shuffling, the corner is that num_partitions=1:

  1. if num_partitions=1 then no shuffling is needed
  2. the shuffler reader would return the data directly
  3. then the __ivf_part_id is not dropped, it's written into the index file as well

@github-actions github-actions bot added the bug Something isn't working label Mar 26, 2025
Signed-off-by: BubbleCal <[email protected]>
Signed-off-by: BubbleCal <[email protected]>
Signed-off-by: BubbleCal <[email protected]>
Signed-off-by: BubbleCal <[email protected]>
@BubbleCal BubbleCal requested review from westonpace and wkalt March 26, 2025 06:51
Signed-off-by: BubbleCal <[email protected]>
@codecov-commenter
Copy link

Codecov Report

Attention: Patch coverage is 92.00000% with 6 lines in your changes missing coverage. Please review.

Project coverage is 78.68%. Comparing base (20cde3b) to head (7d3556e).

Files with missing lines Patch % Lines
rust/lance-index/src/vector/pq/storage.rs 95.16% 1 Missing and 2 partials ⚠️
rust/lance/src/index/vector/ivf/v2.rs 33.33% 2 Missing ⚠️
rust/lance-index/src/vector/storage.rs 90.00% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3606      +/-   ##
==========================================
+ Coverage   78.67%   78.68%   +0.01%     
==========================================
  Files         258      258              
  Lines       96817    96890      +73     
  Branches    96817    96890      +73     
==========================================
+ Hits        76172    76242      +70     
+ Misses      17578    17576       -2     
- Partials     3067     3072       +5     
Flag Coverage Δ
unittests 78.68% <92.00%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@BubbleCal BubbleCal marked this pull request as ready for review March 26, 2025 07:45
Copy link
Contributor

@westonpace westonpace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good find. A few small questions because I'm not sure why we need extra code to generate an invalid state?

@@ -1005,6 +1017,40 @@ mod tests {
.unwrap()
}

async fn create_pq_storage_with_extra_column() -> ProductQuantizationStorage {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we still get PQ storage with an extra column from a real workflow? Or is this just generating some kind of invalid input for testing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's just for testing, we shouldn't see any extra column in real workflow

@@ -1062,4 +1108,25 @@ mod tests {
let dist2 = storage.dist_between(v, u);
assert_eq!(dist1, dist2);
}

#[tokio::test]
async fn test_remap_with_extra_column() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this because some old indices will have this extra column and we need to make sure they are supported?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right, we saw some feedbacks about this, so add this test to make sure the old indices could work with this fix

@BubbleCal BubbleCal requested a review from westonpace March 26, 2025 14:11
@BubbleCal BubbleCal merged commit 33634d3 into lancedb:main Mar 26, 2025
30 checks passed
Jay-ju pushed a commit to Jay-ju/lance that referenced this pull request Mar 28, 2025
now we drop the `__ivf_part_id` when shuffling, the corner is that
`num_partitions=1`:
1. if `num_partitions=1` then no shuffling is needed
2. the shuffler reader would return the data directly
3. then the `__ivf_part_id` is not dropped, it's written into the index
file as well

---------

Signed-off-by: BubbleCal <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants