Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Improve EncryptorImpl with Asynchronous Handling for Scalability #3510

Open
dhrubo-os opened this issue Feb 6, 2025 · 0 comments
Assignees
Labels
enhancement New feature or request

Comments

@dhrubo-os
Copy link
Collaborator

Is your feature request related to a problem?

Summary
The EncryptorImpl class is responsible for encrypting and decrypting text using a tenant-specific master key. It relies on the initMasterKey method to ensure the master key is initialized, creating the plugins-ml-config index if necessary. If the process exceeds 3 seconds, the operation times out. This approach introduces scalability issues and inefficiencies in handling encryption requests.

Problem Statement
The current implementation relies on blocking behavior with a CountDownLatch in initMasterKey, enforcing a strict 3-second timeout. This has several drawbacks:

Scalability Bottleneck: Blocking the thread for up to 3 seconds can lead to resource exhaustion under high concurrency.
Timeout Risks: If the master key retrieval or initialization takes longer than expected due to cluster conditions, requests fail unnecessarily.
Suboptimal Asynchronous Handling: OpenSearch provides ActionListener for non-blocking operations, but the current approach does not leverage it effectively.
Proposed Solution
Refactor initMasterKey to use a fully asynchronous, ActionListener-based approach, removing the blocking timeout and ensuring encryption/decryption requests proceed once initialization completes. The improved flow should:

Replace CountDownLatch with ActionListener to handle master key initialization without blocking.
Queue encryption/decryption requests while the master key is being initialized, processing them once ready.
Ensure failure handling is robust—if the master key cannot be retrieved, propagate the appropriate failure instead of timing out.
This aligns with OpenSearch's architecture, where long-running tasks should use non-blocking patterns to improve system throughput.

Expected Impact
Improved Performance: Eliminates unnecessary thread blocking, allowing more efficient request processing.
Better Scalability: Supports high-concurrency workloads without running into thread contention.
Higher Reliability: Reduces unnecessary failures due to timeout conditions, ensuring encryption requests complete successfully as soon as the master key is available.
By adopting this approach, we enhance the robustness and efficiency of encryption handling in OpenSearch ML Commons, making it resilient to operational delays while ensuring seamless user experience.

Would love to hear thoughts from the team on any additional considerations before moving forward with implementation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants