Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: build invert index in transaction #3452

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

chenkovsky
Copy link
Contributor

@chenkovsky chenkovsky commented Feb 14, 2025

Related to #3269.

I want to build an inverted index for Lance on a distributed system(ray/spark). Currently, I have modified the interface for creating an index to allow an array of fragment IDs to be passed in. If this array is passed in, the index creation interface will return an index object. Finally, commit the CreateIndex operation. Since there is no global statistical information at present, the BM25 search scoring part may need to be modified as the current scoring may not be very accurate. Generally, the basic functions are usable. I'd like to hear everyone's opinions first. whether this approach is feasible.

I also changed CreateIndex operation definition in python, make it similar to rust version. I don't know why it's different from rust version.

@github-actions github-actions bot added enhancement New feature or request python labels Feb 14, 2025
@codecov-commenter
Copy link

Codecov Report

Attention: Patch coverage is 57.14286% with 39 lines in your changes missing coverage. Please review.

Project coverage is 78.83%. Comparing base (8a61b69) to head (7d063a3).
Report is 5 commits behind head on main.

Files with missing lines Patch % Lines
rust/lance/src/io/exec/fts.rs 63.26% 12 Missing and 6 partials ⚠️
rust/lance/src/index/scalar.rs 5.88% 15 Missing and 1 partial ⚠️
rust/lance/src/index.rs 83.33% 1 Missing and 3 partials ⚠️
rust/lance/src/dataset.rs 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3452      +/-   ##
==========================================
- Coverage   78.93%   78.83%   -0.11%     
==========================================
  Files         251      251              
  Lines       92267    92859     +592     
  Branches    92267    92859     +592     
==========================================
+ Hits        72833    73204     +371     
- Misses      16463    16677     +214     
- Partials     2971     2978       +7     
Flag Coverage Δ
unittests 78.83% <57.14%> (-0.11%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request python
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants