Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: replicate boundary vectors to multiple partitions #2258

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

BubbleCal
Copy link
Contributor

No description provided.

@BubbleCal BubbleCal marked this pull request as draft April 24, 2024 23:36
@@ -106,6 +113,80 @@ impl<T: ArrowFloatType + L2 + Dot> IvfTransformer<T> {

UInt32Array::from_iter(result.iter().flatten().copied())
}

/// Compute the partition for each row in the input Matrix.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we use this in search? or just in build time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only in build time

Copy link
Contributor

@chebbyChefNEQ chebbyChefNEQ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just one concern: can we define compute_partitions in terms of compute_multiple_partitions instead of duplicating the code?

same for compute_multiple_memberships

let part_ids = self.compute_partitions(&mat).await;
let field = Field::new(PART_ID_COLUMN, part_ids.data_type().clone(), true);
Ok(batch.try_with_column(field, Arc::new(part_ids))?)
let part_ids = self.compute_multiple_partitions(&mat).await;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, maybe we should use a different transformer for search?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we use transformer for search?
the code path should be IVFIndex::find_partitions -> IvfImpl::find_partitions -> kmeans::find_partitions right?

@BubbleCal
Copy link
Contributor Author

just one concern: can we define compute_partitions in terms of compute_multiple_partitions instead of duplicating the code?

same for compute_multiple_memberships

sure we can, just duplicated the code for now because compute_partitions is extremely optimized and I paid as less as possible effort to get this feature worked first.
will remove the dup later

Signed-off-by: BubbleCal <[email protected]>
@BubbleCal BubbleCal added the donotmerge Do not merge label May 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
donotmerge Do not merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants