Skip to content

Conversation

@wxyucs
Copy link
Collaborator

@wxyucs wxyucs commented Dec 22, 2025

No description provided.

@wxyucs wxyucs requested review from LHT129 and inabao December 22, 2025 08:52
@wxyucs wxyucs self-assigned this Dec 22, 2025
@wxyucs wxyucs requested a review from jiaweizone as a code owner December 22, 2025 08:52
@gemini-code-assist
Copy link

Summary of Changes

Hello @wxyucs, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request focuses on enhancing the robustness of the search parameter parsing across various vector search indices. It introduces new test infrastructure and specific test cases for DiskANN, HGraph, HNSW, and SINDI to ensure that their search functionalities can handle and ignore irrelevant parameters within the search configuration. This prevents potential errors or unexpected behavior when users provide extraneous information, thereby improving the overall stability and fault tolerance of the search system.

Highlights

  • New Test Utility Function: A new static method, TestSearchUnrelatedParameter, has been added to the TestIndex class. This function standardizes the process of testing how search operations handle parameters that are not directly relevant to the index's configuration.
  • Expanded Test Coverage for Search Parameters: New test cases have been introduced for DiskANN, HGraph, HNSW, and SINDI indices. These tests specifically verify that the search functions correctly process valid parameters while gracefully ignoring any extraneous or 'unrelated' parameters provided in the search configuration JSON.
  • Test Resource Management Refactoring: The DiskANN and HNSW test suites have been refactored to include dedicated TestResource classes and GetResource methods. This improves the organization and reusability of test data and configurations.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@wxyucs wxyucs added the kind/improvement Code improvements (variable/function renaming, refactoring, etc. ) label Dec 22, 2025
Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces tests to ensure that search operations handle unrelated parameters gracefully, which is a valuable addition for API robustness. The refactoring of test fixtures in test_diskann.cpp and test_hnsw.cpp to centralize test resource management is also a good improvement. I've identified a few minor issues, including typos in test names and JSON keys, a copy-paste error in test tags, and some redundant loops in the new tests that could be optimized. Addressing these points will enhance the quality and maintainability of the test suite.

"ef_search": 200,
"io_limit": 200,
"beam_search": 4,
"-------unreleated parameters below-------": true,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There's a typo in the JSON key. It should be "unrelated" instead of "unreleated".

Suggested change
"-------unreleated parameters below-------": true,
"-------unrelated parameters below-------": true,

Comment on lines +767 to +829
for (auto& [base_quantization_str, recall] : resource->test_cases) {
INFO(fmt::format("metric_type: {}, dim: {}, base_quantization_str: {}, recall: {}",
metric_type,
dim,
base_quantization_str,
recall));
vsag::Options::Instance().set_block_size_limit(size);
auto param =
DiskANNTestIndex::GenerateDiskANNBuildParametersString(metric_type, dim);
auto index = TestIndex::TestFactory(test_index->name, param, true);
auto dataset = DiskANNTestIndex::pool.GetDatasetAndCreate(
dim, resource->base_count, metric_type);
TestIndex::TestBuildIndex(index, dataset, true);
TestIndex::TestSearchUnrelatedParameter(index, dataset, search_param);
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The loop over resource->test_cases is redundant because the variables base_quantization_str and recall are only used for logging. The GenerateDiskANNBuildParametersString function doesn't use base_quantization_str, so the test logic inside the loop is identical for each iteration. To avoid redundant test executions, especially if more test cases are added later, consider moving the test logic outside of this loop.

constexpr const char* search_param = R"({
"hgraph": {
"ef_search": 200,
"-------unreleated parameters below-------": true,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There's a typo in the JSON key. It should be "unrelated" instead of "unreleated".

                "-------unrelated parameters below-------": true,


TEST_CASE_PERSISTENT_FIXTURE(fixtures::HGraphTestIndex,
"(PR) HGraph SearchUnrelatedParameter",
"[ft][diskann]") {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This appears to be a copy-paste error. The test tag should be [ft][hgraph] instead of [ft][diskann] for an HGraph test.

                             "[ft][hgraph]") {

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto


TEST_CASE_PERSISTENT_FIXTURE(fixtures::HGraphTestIndex,
"(Daily) HGraph SearchUnrelatedParameter",
"[ft][diskann]") {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This appears to be a copy-paste error. The test tag should be [ft][hgraph] instead of [ft][diskann] for an HGraph test.

                             "[ft][hgraph]") {

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

constexpr const char* search_param = R"({
"hnsw": {
"ef_search": 200,
"-------unreleated parameters below-------": true,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There's a typo in the JSON key. It should be "unrelated" instead of "unreleated".

                "-------unrelated parameters below-------": true,

Comment on lines +741 to +754
for (auto& [base_quantization_str, recall] : resource->test_cases) {
INFO(fmt::format("metric_type: {}, dim: {}, base_quantization_str: {}, recall: {}",
metric_type,
dim,
base_quantization_str,
recall));
vsag::Options::Instance().set_block_size_limit(size);
auto param = HNSWTestIndex::GenerateHNSWBuildParametersString(metric_type, dim);
auto index = TestIndex::TestFactory(test_index->name, param, true);
auto dataset =
HNSWTestIndex::pool.GetDatasetAndCreate(dim, resource->base_count, metric_type);
TestIndex::TestBuildIndex(index, dataset, true);
TestIndex::TestSearchUnrelatedParameter(index, dataset, search_param);
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The loop over resource->test_cases is redundant. The variables base_quantization_str and recall are only used for logging, and GenerateHNSWBuildParametersString doesn't use base_quantization_str. This means the test logic inside the loop is the same for each iteration. Consider moving the test logic outside of this loop to avoid redundant test runs.

}

TEST_CASE_PERSISTENT_FIXTURE(fixtures::SINDITestIndex,
"SINDI Search Unreleated Param",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There's a typo in the test case name. It should be "Unrelated" instead of "Unreleated".

                             "SINDI Search Unrelated Param",

"n_candidate": 20,
"query_prune_ratio": 0.0,
"term_prune_ratio": 0.0,
"-------unreleated parameters below-------": true,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There's a typo in the JSON key. It should be "unrelated" instead of "unreleated".

                "-------unrelated parameters below-------": true,

@wxyucs wxyucs force-pushed the add-search-unrelated-param-tests branch from 8dcc5e1 to 961f021 Compare December 22, 2025 11:29
TEST_CASE_PERSISTENT_FIXTURE(fixtures::DiskANNTestIndex,
"(PR) DiskANN SearchUnrelatedParameter",
"[ft][diskann]") {
auto test_index = std::make_shared<DiskANNTestIndex>();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need pr tag

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

"[ft][diskann]") {
auto test_index = std::make_shared<DiskANNTestIndex>();
auto resource = test_index->GetResource(false);
TestDiskANNSearchUnrelatedParameter(test_index, resource);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need daily tag

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


TEST_CASE_PERSISTENT_FIXTURE(fixtures::HGraphTestIndex,
"(PR) HGraph SearchUnrelatedParameter",
"[ft][diskann]") {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto


TEST_CASE_PERSISTENT_FIXTURE(fixtures::HGraphTestIndex,
"(Daily) HGraph SearchUnrelatedParameter",
"[ft][diskann]") {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

@wxyucs wxyucs force-pushed the add-search-unrelated-param-tests branch from 961f021 to a2b1514 Compare December 25, 2025 05:56
Copy link
Collaborator

@LHT129 LHT129 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@codecov
Copy link

codecov bot commented Dec 25, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.

@@            Coverage Diff             @@
##             0.16    #1455      +/-   ##
==========================================
- Coverage   92.23%   92.23%   -0.01%     
==========================================
  Files         295      295              
  Lines       15660    15659       -1     
==========================================
- Hits        14444    14443       -1     
  Misses       1216     1216              
Flag Coverage Δ
cpp 92.23% <ø> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
common 92.62% <ø> (ø)
datacell 92.46% <ø> (+0.17%) ⬆️
index 91.13% <ø> (-0.10%) ⬇️
simd 100.00% <ø> (ø)

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d7aec6a...88a4dde. Read the comment docs.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@wxyucs wxyucs force-pushed the add-search-unrelated-param-tests branch 2 times, most recently from 9710678 to b5c06dd Compare December 31, 2025 07:51
@wxyucs wxyucs force-pushed the add-search-unrelated-param-tests branch 5 times, most recently from 3b3b54f to ba982a8 Compare January 16, 2026 03:13
@wxyucs wxyucs force-pushed the add-search-unrelated-param-tests branch from ba982a8 to 88a4dde Compare January 16, 2026 08:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind/improvement Code improvements (variable/function renaming, refactoring, etc. ) module/testing size/L version/0.16

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants