You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
However, this does not cover all exceptions used in ML Commons. The project also frequently uses OpensearchStatusException in multiple places.
This creates an inconsistency, where "not found" exceptions from OpensearchStatusException get incorrectly included in failure stats, even though they should not count as failures.
Problem Statement
Incomplete Exception Categorization
MLException is well-structured for handling ML-specific failures, but OpensearchStatusException is used without proper categorization.
This leads to misclassification of errors, particularly 404 Not Found, which should not contribute to failure metrics.
Inconsistent Logging Severity
ML Commons logs severity based on MLException, but errors from OpensearchStatusException do not follow the same log severity rules.
This results in inconsistent error reporting and debugging challenges.
Misclassified Stats Updates
The system updates failure stats when an MLException occurs, but some errors from OpensearchStatusException should be excluded.
Example: A 404 Not Found from OpensearchStatusException should not be counted as a failure, but it currently is.
Proposed Solution
Enhance MLExceptionUtils to:
Map OpensearchStatusException properly based on HTTP status codes:
404 Not Found → Should not update failure stats.
500 Internal Server Error → Should still be counted as a failure.
Ensure consistent logging levels:
Align OpensearchStatusException with MLException log severity rules.
Refactor all OpensearchStatusException handling through MLExceptionUtils:
Centralize exception handling for unified processing.
Expected Impact
✅ More accurate failure statistics: No longer miscounting expected errors (e.g., 404) as system failures.
✅ Consistent log severity levels: Easier debugging and monitoring.
✅ Unified exception handling: Clearer classification between OpenSearch errors and ML Commons-specific errors.
This will improve system reliability, ensure consistent failure tracking, and reduce unnecessary alerts in logs.
Would love feedback before moving forward with implementation!
The text was updated successfully, but these errors were encountered:
Summary
ML Commons primarily relies on MLException and its derived classes:
ExecuteException
MLLimitExceededException
MLResourceNotFoundException
MLValidationException
These exceptions define log severity (reference) and contribute to stats updates in the following places:
Stats update code reference 1
Stats update code reference 2
However, this does not cover all exceptions used in ML Commons. The project also frequently uses OpensearchStatusException in multiple places.
This creates an inconsistency, where "not found" exceptions from
OpensearchStatusException
get incorrectly included in failure stats, even though they should not count as failures.Problem Statement
Incomplete Exception Categorization
MLException is well-structured for handling ML-specific failures, but OpensearchStatusException is used without proper categorization.
This leads to misclassification of errors, particularly 404 Not Found, which should not contribute to failure metrics.
Inconsistent Logging Severity
ML Commons logs severity based on MLException, but errors from OpensearchStatusException do not follow the same log severity rules.
This results in inconsistent error reporting and debugging challenges.
Misclassified Stats Updates
The system updates failure stats when an MLException occurs, but some errors from
OpensearchStatusException
should be excluded.Example: A 404 Not Found from
OpensearchStatusException
should not be counted as a failure, but it currently is.Proposed Solution
Enhance MLExceptionUtils to:
Map
OpensearchStatusException
properly based on HTTP status codes:404 Not Found → Should not update failure stats.
500 Internal Server Error → Should still be counted as a failure.
Ensure consistent logging levels:
Align OpensearchStatusException with MLException log severity rules.
Refactor all OpensearchStatusException handling through MLExceptionUtils:
Centralize exception handling for unified processing.
Expected Impact
✅ More accurate failure statistics: No longer miscounting expected errors (e.g., 404) as system failures.
✅ Consistent log severity levels: Easier debugging and monitoring.
✅ Unified exception handling: Clearer classification between OpenSearch errors and ML Commons-specific errors.
This will improve system reliability, ensure consistent failure tracking, and reduce unnecessary alerts in logs.
Would love feedback before moving forward with implementation!
The text was updated successfully, but these errors were encountered: