-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support safe update #308
base: main
Are you sure you want to change the base?
support safe update #308
Conversation
LOG_ERROR_AND_RETURNS(ErrorType::INVALID_ARGUMENT, | ||
fmt::format("failed to pretrain(invalid argument): base tag id " | ||
"({}) doesn't belong to index", | ||
base_tag_id)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a minor fix: error message "bas tag id" to "base tag id"
* @return result indicates whether the update operation is successful. | ||
*/ | ||
virtual tl::expected<bool, Error> | ||
UpdateVector(int64_t id, const DatasetPtr& new_base, bool need_fine_tune = false) { | ||
UpdateVector(int64_t id, const DatasetPtr& new_base, bool force_update = false) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
p.s. We are not responsible for the consequences of this interface if the user specifies force_update = true
Codecov ReportAll modified and coverable lines are covered by tests ✅ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
"ef_search": {} | ||
}} | ||
}})", | ||
vsag::UPDATE_CHECK_SEARCH_L), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indent
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
float neighbor_dist = | ||
std::reinterpret_pointer_cast<hnswlib::HierarchicalNSW>(alg_hnsw_) | ||
->getDistanceByLabel(result.value()->GetIds()[i], new_base_vec); | ||
if (neighbor_dist < self_dist) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here need set a range limit ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can implicitly check the range by checking the neighborhood relationship, since it's hard to directly give a range that is general for every dataset. Besides, if the nearest neighbor of the updated vector is still the old vector, it remains our assumption of this interface (the updated vector stays in a nearby area around the original vector).
for (int i = 0; i < result.value()->GetDim(); i++) { | ||
float neighbor_dist = | ||
std::reinterpret_pointer_cast<hnswlib::HierarchicalNSW>(alg_hnsw_) | ||
->getDistanceByLabel(result.value()->GetIds()[i], new_base_vec); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#309 provides a batch interface, which can be considered for use later on
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fix this in other pr?
Signed-off-by: zhongxiaoyao.zxy <[email protected]>
652697a
to
9ef9613
Compare
closes: #225