Skip to content

[BUGFIX]Catch Merge IOException#3140

Open
luyuncheng wants to merge 7 commits intoopensearch-project:mainfrom
luyuncheng:FIXMergeException
Open

[BUGFIX]Catch Merge IOException#3140
luyuncheng wants to merge 7 commits intoopensearch-project:mainfrom
luyuncheng:FIXMergeException

Conversation

@luyuncheng
Copy link
Collaborator

@luyuncheng luyuncheng commented Mar 3, 2026

Description

#2529 we made faiss engine can abort. so we catch some exceptions.

BUGFIX1: Exception

but when there is some runtime exception in native io, like disk io exception, other runtime exceptions. it should be catch by Lucene#ConcurrentMergeScheduler

public void mergeIndex(final Supplier<KNNVectorValues<?>> knnVectorValuesSupplier, int totalLiveDocs) throws IOException {
KNNVectorValues<?> knnVectorValues = knnVectorValuesSupplier.get();
initializeVectorValues(knnVectorValues);
if (knnVectorValues.docId() == NO_MORE_DOCS) {
// This is in place so we do not add metrics
log.debug("Skipping mergeIndex, vector values are already iterated for {}", fieldInfo.name);
return;
}
long bytesPerVector = knnVectorValues.bytesPerVector();
try {
startMergeStats(totalLiveDocs, bytesPerVector);
buildAndWriteIndex(knnVectorValuesSupplier, totalLiveDocs, false);
endMergeStats(totalLiveDocs, bytesPerVector);
} catch (IndexBuildAbortedException ex) {
log.warn("Merge Aborted for field {}", fieldInfo.name, ex);
throw new MergePolicy.MergeAbortedException("KNN Merge aborted.");
} catch (Exception ex) {
log.error("Merge exception happened for field {}", fieldInfo.name, ex);
}
}

we need made runtime exception throw to lucene level which can be handled by ConcurrentMergeScheduler#handleMergeException in NativeIndexWriter.java#L134

BUGFIX2: permission Access

In MergeAbortChecker we changed MergeThread accessibility in code:

static {
try {
MERGE_FIELD = ConcurrentMergeScheduler.MergeThread.class.getDeclaredField(MERGE_FIELD_NAME);
MERGE_FIELD.setAccessible(true);
hasMergeField = true;
} catch (NoSuchFieldException e) {
hasMergeField = false;
log.error("Not find merge field in MergeThread", e);
}
}

so we need add suppressAccessChecks and accessDeclaredMemberspermission into plugin-security.policy

Related Issues

Resolves #[Issue number to be closed when this PR is merged]

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: luyuncheng <luyuncheng@bytedance.com>
Signed-off-by: luyuncheng <luyuncheng@bytedance.com>
Signed-off-by: luyuncheng <luyuncheng@bytedance.com>
Signed-off-by: luyuncheng <luyuncheng@bytedance.com>
Signed-off-by: luyuncheng <luyuncheng@bytedance.com>
Comment on lines 133 to +135
} catch (Exception ex) {
log.error("Merge exception happened for field {}", fieldInfo.name, ex);
throw new IOException("Merge exception happened for field: " + fieldInfo.name, ex);
Copy link
Collaborator

@shatejas shatejas Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its not a good idea to catch a generic exception and throw IOException. Can we catch a IOException and rethrow?

Comment on lines -420 to -422
catch (const faiss::FaissException& e) {
std::cout << "====> query get faiss exception:" << e.what() << "\n";
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change seems right but out out curiosity was the query returning empty response when faiss exception was thrown?

permission java.lang.RuntimePermission "loadLibrary.opensearchknn_faiss_avx512_spr";
permission java.net.SocketPermission "*", "connect,resolve";
permission java.lang.RuntimePermission "accessDeclaredMembers";
permission java.lang.reflect.ReflectPermission "suppressAccessChecks";
Copy link
Collaborator

@shatejas shatejas Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain how does it affect/what errors are seen if this is not present, was just curious?

Signed-off-by: Navneet Verma <navneev@amazon.com>
@codecov
Copy link

codecov bot commented Mar 12, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 82.41%. Comparing base (a2372a3) to head (de3a0b3).

Additional details and impacted files
@@             Coverage Diff              @@
##               main    #3140      +/-   ##
============================================
- Coverage     82.43%   82.41%   -0.02%     
+ Complexity     3874     3872       -2     
============================================
  Files           418      418              
  Lines         14474    14475       +1     
  Branches       1851     1851              
============================================
- Hits          11932    11930       -2     
- Misses         1782     1786       +4     
+ Partials        760      759       -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Signed-off-by: Tejas Shah <shatejas@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants