Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implement the interfaces for add, knnsearch, serialize in pyramid #231

Merged
merged 1 commit into from
Jan 9, 2025

Conversation

inabao
Copy link
Collaborator

@inabao inabao commented Dec 18, 2024

@inabao inabao added the kind/feature New feature or request label Dec 18, 2024
@inabao inabao self-assigned this Dec 18, 2024
@inabao inabao force-pushed the new-pyramid-implementation branch 5 times, most recently from 22b8903 to 5d932f3 Compare December 19, 2024 08:17
src/index/pyramid_zparameters.cpp Outdated Show resolved Hide resolved
src/index/pyramid.cpp Outdated Show resolved Hide resolved
src/index/pyramid.h Outdated Show resolved Hide resolved
src/index/pyramid.h Outdated Show resolved Hide resolved
@LHT129 LHT129 self-requested a review December 25, 2024 06:12
Copy link
Collaborator

@LHT129 LHT129 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

// Function to convert BinarySet to a Binary
Binary
binaryset_to_binary(const BinarySet binarySet) {
size_t totalSize = 0;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

totalSize -> total_size

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

namespace vsag {

// Function to convert BinarySet to a Binary
Binary
binaryset_to_binary(const BinarySet binarySet) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

binarySet -> binary_set

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

size_t offset = 0;

for (const auto& key : keys) {
size_t keySize = key.size();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

keySize -> key_size

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


BinarySet
binary_to_binaryset(const Binary binary) {
BinarySet binarySet;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

binarySet -> binary_set

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

size_t offset = 0;

while (offset < binary.size) {
size_t keySize;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

keySize -> key_size

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

start = end + 1;
end = str.find(delimiter, start);
}
std::string lastToken = str.substr(start);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lastToken -> last_token

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

template <typename T>
using Deque = std::deque<T, vsag::AllocatorWrapper<T>>;

constexpr static const char PART_SLASH = '/';
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move to file head ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

constexpr static const char PART_SLASH = '/';
constexpr static const char PART_OCTOTHORPE = '#';
std::vector<std::string>
split(const std::string& str, char delimiter) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can use std lib or boost lib split implement ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that there is no split function in the standard library. Currently, vsag has no dependencies on Boost, so it would be a bit excessive to introduce Boost just for the split function.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK,can move to common utils file

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please reference high performance implement (eg. boost / absl)

if (indexes_.find(result[0]) == indexes_.end()) {
indexes_[result[0]] = std::make_shared<IndexNode>(commom_param_.allocator_.get());
}
std::shared_ptr<IndexNode> node = indexes_.at(result[0]);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: here can reduce one find cost

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

node->children[result[j]] =
std::make_shared<IndexNode>(commom_param_.allocator_.get());
}
node = node->children.at(result[j]);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@inabao inabao force-pushed the new-pyramid-implementation branch from 2ee1a92 to 3b6956c Compare January 2, 2025 03:38
Copy link

codecov bot commented Jan 2, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

see 24 files with indirect coverage changes

constexpr static const char PART_SLASH = '/';
constexpr static const char PART_OCTOTHORPE = '#';
std::vector<std::string>
split(const std::string& str, char delimiter) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK,can move to common utils file

auto data_vectors = base->GetFloat32Vectors();
for (int i = 0; i < data_num; ++i) {
std::string current_path = path[i];
auto result = split(current_path, PART_SLASH);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

result -> path_slices ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

constexpr static const char PART_SLASH = '/';
constexpr static const char PART_OCTOTHORPE = '#';
std::vector<std::string>
split(const std::string& str, char delimiter) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please reference high performance implement (eg. boost / absl)

auto path = query->GetPaths();

std::string current_path = path[0];
auto parsed_path = split(current_path, PART_SLASH);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parsed_path -> path_slices ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

ret->Dim(0)->NumElements(1);
return ret;
}
std::shared_ptr<IndexNode> root = indexes_.at(parsed_path[0]);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: reduce cost

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

ret->Dim(0)->NumElements(1);
return ret;
}
std::shared_ptr<IndexNode> root = indexes_.at(parsed_path[0]);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: reduce cost

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

ret->Dim(0)->NumElements(1);
return ret;
}
std::shared_ptr<IndexNode> root = indexes_.at(parsed_path[0]);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: reduce cost

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

ret->Dim(0)->NumElements(1);
return ret;
}
std::shared_ptr<IndexNode> root = indexes_.at(parsed_path[0]);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: reduce cost

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

auto result = node->index->KnnSearch(query, k, parameters, invalid);
if (result.has_value()) {
DatasetPtr r = result.value();
for (int i = 0; i < r->GetDim(); ++i) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

r->GetDim() ?
is r->GetElements() ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

r->GetDim() denotes the number of the results for single query. r->GetElements() denotes the number of queries

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reminder: Update in next PR

}
} else {
for (const auto& item : node->children) {
candidate_indexes.emplace_back(item.second);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do search for all sub tree ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a node has an index, then its child nodes will not be searched.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's necessary to introduce retrieval logic that recursively flows from higher layers to lower layers.

@inabao inabao force-pushed the new-pyramid-implementation branch 2 times, most recently from c263cdd to d02dcaa Compare January 6, 2025 06:43
src/default_allocator.h Outdated Show resolved Hide resolved
auto result = node->index->KnnSearch(query, k, parameters, invalid);
if (result.has_value()) {
DatasetPtr r = result.value();
for (int i = 0; i < r->GetDim(); ++i) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reminder: Update in next PR

@inabao inabao force-pushed the new-pyramid-implementation branch 3 times, most recently from 7b23037 to 5f571db Compare January 7, 2025 07:28
@inabao inabao force-pushed the new-pyramid-implementation branch from 0270eea to 3d47fe7 Compare January 8, 2025 13:10
Copy link
Collaborator

@jiaweizone jiaweizone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some issues will fix in next PR

@inabao inabao merged commit 462ae8c into main Jan 9, 2025
11 checks passed
@inabao inabao deleted the new-pyramid-implementation branch January 9, 2025 13:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants