-
Notifications
You must be signed in to change notification settings - Fork 80
[WIP] fix(pyramid): fix error in Pyramid CI #1408
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Reviewer's GuideAdds diagnostic logging to the Pyramid serialization test to debug mismatched search results, and introduces a helper shell script to run the Pyramid Serialize File tests in parallel for CI reproduction and stress testing. File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
Summary of ChangesHello @inabao, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request addresses an error in the Pyramid CI by introducing a new shell script for parallel execution of functional tests and augmenting the Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey there - I've reviewed your changes and found some issues that need to be addressed.
- The added debug printing in
TestSerializeBinarySet(flagprand multiplestd::cout/fmt::formatcalls) looks like temporary diagnostics; consider either removing it or reworking it to use the test framework’s logging facilities so test output doesn’t become noisy in normal runs. - The new
test.shscript runs an infinite loop with hard‑coded paths and parameters; if this is meant as a local stress tool, consider moving it to a separate tooling directory, adding a way to bound the run (e.g., max iterations or duration), and avoiding hard‑codedbuild-release/tests/functestsso it’s usable across environments. - In
test.sh, thepersistent_parallelloop checksjobs -p | wc -lin a tight loop with very short sleep; consider increasing the sleep interval or restructuring to avoid unnecessary CPU usage when processes are already at the desired concurrency.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The added debug printing in `TestSerializeBinarySet` (flag `pr` and multiple `std::cout`/`fmt::format` calls) looks like temporary diagnostics; consider either removing it or reworking it to use the test framework’s logging facilities so test output doesn’t become noisy in normal runs.
- The new `test.sh` script runs an infinite loop with hard‑coded paths and parameters; if this is meant as a local stress tool, consider moving it to a separate tooling directory, adding a way to bound the run (e.g., max iterations or duration), and avoiding hard‑coded `build-release/tests/functests` so it’s usable across environments.
- In `test.sh`, the `persistent_parallel` loop checks `jobs -p | wc -l` in a tight loop with very short sleep; consider increasing the sleep interval or restructuring to avoid unnecessary CPU usage when processes are already at the desired concurrency.
## Individual Comments
### Comment 1
<location> `tests/test_index.cpp:989-998` </location>
<code_context>
+ bool pr = false;
</code_context>
<issue_to_address>
**issue (testing):** The original assertion comparing result IDs has been removed, so this test no longer verifies the expected behavior.
This change removes the core correctness check (`res_to.value()->GetIds()[j] == res_from.value()->GetIds()[j]`) and replaces it with a flag/logging path that never fails the test, so regressions could pass unnoticed. Please restore an assertion on the IDs (e.g., `REQUIRE`/`CHECK`) and keep the logging as a supplement. If ID equality is no longer required, update the test name and add assertions that clearly express the new expected behavior instead.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a shell script for running tests in parallel and modifies a test case, likely to debug a CI failure. My review focuses on improving the robustness of the shell script and fixing a critical issue in the modified C++ test case where an assertion was removed, effectively disabling the test. I've provided suggestions to handle test failures correctly in the script and to restore the test's assertion while keeping the intended debug logging.
| bool pr = false; | ||
| for (auto j = 0; j < topk; ++j) { | ||
| REQUIRE(res_to.value()->GetIds()[j] == res_from.value()->GetIds()[j]); | ||
| if (res_to.value()->GetIds()[j] != res_from.value()->GetIds()[j]) { | ||
| pr = true; | ||
| } | ||
| } | ||
| if (pr) { | ||
| auto query_path = dataset->query_->GetPaths()[i]; | ||
| std::cout << "query path: " << query_path << std::endl; | ||
| for (auto j = 0; j < topk; ++j) { | ||
| auto from_id = res_from.value()->GetIds()[j] >> 16; | ||
| auto to_id = res_to.value()->GetIds()[j] >> 16; | ||
| auto from_path = dataset->base_->GetPaths()[from_id]; | ||
| auto to_path = dataset->base_->GetPaths()[to_id]; | ||
| auto from_distance = res_from.value()->GetDistances()[j]; | ||
| auto to_distance = res_to.value()->GetDistances()[j]; | ||
| std::cout << fmt::format("rank {}: from_id {}, from_path {}, from_distance {:.6f} | to_id {}, to_path {}, to_distance {:.6f}", | ||
| j, from_id, from_path, from_distance, to_id, to_path, to_distance) << std::endl; | ||
| } | ||
| std::cout << std::endl; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original REQUIRE assertion has been removed and replaced with logging. This is a critical issue as it means the test will no longer fail if there is a mismatch, potentially hiding bugs. While the detailed logging is useful for debugging, the test must still fail.
The current implementation is also inefficient as it iterates through all topk results to check for a mismatch, even after one has been found.
I suggest refactoring this to first check for a mismatch, and if one is found, print the detailed debug information and then explicitly fail the test using REQUIRE(false). This restores the test's correctness while keeping the valuable debug output.
bool mismatch = false;
for (auto j = 0; j < topk; ++j) {
if (res_to.value()->GetIds()[j] != res_from.value()->GetIds()[j]) {
mismatch = true;
break;
}
}
if (mismatch) {
auto query_path = dataset->query_->GetPaths()[i];
std::cout << "Mismatch in TestSerializeBinarySet for query path: " << query_path << std::endl;
for (auto j = 0; j < topk; ++j) {
auto from_id_full = res_from.value()->GetIds()[j];
auto to_id_full = res_to.value()->GetIds()[j];
auto from_id = from_id_full >> 16;
auto to_id = to_id_full >> 16;
auto from_path = dataset->base_->GetPaths()[from_id];
auto to_path = dataset->base_->GetPaths()[to_id];
auto from_distance = res_from.value()->GetDistances()[j];
auto to_distance = res_to.value()->GetDistances()[j];
std::cout << fmt::format("rank {}: from_id {}, from_path {}, from_distance {:.6f} | to_id {}, to_path {}, to_distance {:.6f}{}",
j, from_id, from_path, from_distance, to_id, to_path, to_distance,
(from_id_full != to_id_full ? " <-- MISMATCH" : "")) << std::endl;
}
std::cout << std::endl;
REQUIRE(false); // Fail test to make CI failure visible
}| batch_parallel() { | ||
| while true; do | ||
| echo "启动 $CONCURRENCY 个并行测试进程..." | ||
| for ((i=1; i<=$CONCURRENCY; i++)); do | ||
| ./build-release/tests/functests "Pyramid Serialize File" & | ||
| done | ||
| wait # 等待本批次所有进程完成 | ||
| done | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The wait command on its own will return the exit status of the last background job to finish. If any other job fails, its failure will not be caught, and the script will continue to run. To make this script robust for CI, you should wait for each job individually and exit if any of them fails.
| batch_parallel() { | |
| while true; do | |
| echo "启动 $CONCURRENCY 个并行测试进程..." | |
| for ((i=1; i<=$CONCURRENCY; i++)); do | |
| ./build-release/tests/functests "Pyramid Serialize File" & | |
| done | |
| wait # 等待本批次所有进程完成 | |
| done | |
| } | |
| batch_parallel() { | |
| while true; do | |
| echo "启动 $CONCURRENCY 个并行测试进程..." | |
| for ((i=1; i<=$CONCURRENCY; i++)); do | |
| ./build-release/tests/functests "Pyramid Serialize File" & | |
| done | |
| for job in $(jobs -p); do | |
| wait "$job" || exit $? | |
| done | |
| done | |
| } |
| persistent_parallel() { | ||
| while true; do | ||
| jobs_count=$(jobs -p | wc -l) | ||
| if [ $jobs_count -lt $CONCURRENCY ]; then | ||
| ./build-release/tests/functests "Pyramid Serialize File" & | ||
| else | ||
| sleep 0.1 # 避免CPU占用过高 | ||
| fi | ||
| done | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current implementation of persistent_parallel uses a polling mechanism with sleep, which is inefficient and can lead to unnecessary CPU usage. A more efficient and modern approach is to use wait -n (available in Bash 4.3+), which waits for the next background job to terminate. This avoids polling and also allows for checking the exit status of each job as it completes.
| persistent_parallel() { | |
| while true; do | |
| jobs_count=$(jobs -p | wc -l) | |
| if [ $jobs_count -lt $CONCURRENCY ]; then | |
| ./build-release/tests/functests "Pyramid Serialize File" & | |
| else | |
| sleep 0.1 # 避免CPU占用过高 | |
| fi | |
| done | |
| } | |
| persistent_parallel() { | |
| for ((i=1; i<=$CONCURRENCY; i++)); do | |
| ./build-release/tests/functests "Pyramid Serialize File" & | |
| done | |
| # Requires bash 4.3+ | |
| while true; do | |
| wait -n || exit $? | |
| ./build-release/tests/functests "Pyramid Serialize File" & | |
| done | |
| } |
Codecov Report✅ All modified and coverable lines are covered by tests. @@ Coverage Diff @@
## main #1408 +/- ##
==========================================
+ Coverage 91.81% 91.82% +0.01%
==========================================
Files 323 323
Lines 18439 18439
==========================================
+ Hits 16930 16932 +2
+ Misses 1509 1507 -2
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report in Codecov by Sentry.
🚀 New features to boost your workflow:
|
|
unused |
Signed-off-by: jinjiabao.jjb <[email protected]>
Signed-off-by: jinjiabao.jjb <[email protected]>
Signed-off-by: jinjiabao.jjb <[email protected]>
Signed-off-by: jinjiabao.jjb <[email protected]>
Summary by Sourcery
Relax strict result equality in the Pyramid serialization test and add a helper script for stress-running the Pyramid functional tests in parallel.
Tests:
Chores: