Skip to content

[Draft] Multi-Tenant CMS Manager #14953

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 24 commits into
base: main
Choose a base branch
from
Draft

Conversation

yaser-aj
Copy link

A proposed solution to #13883.

Currently, each ConcurrentMergeScheduler manages its own threads independently, assuming no other merge schedulers are running in the JVM. This leads to potential over-allocation of system resources when many IndexWriters are active. This draft PR demonstrates a simple approach to multi-tenancy by introducing a global manager that divides a global thread budget among all registered CMS instances.

Key Changes

  • New Class: MultiTenantCMSManager
    • Singleton manager that tracks all active ConcurrentMergeScheduler instances.
    • Divides a global thread budget (based on available CPU cores) equally among all registered CMSs.
    • Dynamically updates each CMS’s maxThreadCount and maxMergeCount as CMSs are registered or unregistered.
    • Includes testing hooks for registration and budget management.
  • Integration with ConcurrentMergeScheduler
    • CMS now registers itself with the manager upon construction and unregisters on close.
    • CMS thread/merge limits are automatically managed by the global manager.
  • Tests
    • Added unit tests for registration, dynamic rebalancing, live IndexWriter integration, and auto-unregistering behavior.

Important Notes

  • This is a demonstration/first step toward multi-tenant merge scheduling. The logic can be extended to support more advanced scheduling and dynamic resource allocation in the future.
  • The manager currently uses a simple equal-share policy for thread allocation (total thread count divided by merge scheduler count).

Will update this draft PR with more points + to-dos soon.

lukewilner and others added 24 commits July 3, 2025 16:11
Initial working MultiTenantCMSManager with 3 basic tests
A simplistic MultiTenantCMSManager implementation
A simplistic MultiTenantCMSManager implementation with 3 basic tests
add dynamic rebalancing and live writer tests
@lukewilner
Copy link

Continuing where @yaser-aj left off, here's what I plan to tackle next:

  • Improve the thread sharing logic: Right now it's a simple equal split, but I want to explore smarter ways to assign threads based on actual activity or load.
  • Stress test
  • Clean up unregister behavior: Making sure CMS instances get removed safely even if they’re GC’d or closed suddenly.
  • luceneutil benchmarks

Please let us know if we are missing anything important. Thanks!

Copy link
Contributor

@vigyasharma vigyasharma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Yaser, and Luke, for sharing a prototype of this approach. The simplicity of tweaking things from outside while leaving CMS untouched, is indeed tempting!

I do agree with @jpountz however that

  1. The simplest version (equal sharing) is too limited.
  2. As we start building more elaborate logic, things get very complex very quickly.
    • As a small example, an equal sharing strategy doesn't account for some schedulers getting more merges than others. Once you start doing that, updateBudget() must be called each time a merge is submitted or completed, which needs deeply integrated hooks.
    • The real complexity in designing and maintaining, lies in figuring out the right policy budget allocation policy.
  3. It doesn't provide a sure way of controlling the total no. of threads consumed by indexing+merging, which is a primary motivation for such a scheduler.

Hence, I'm leaning more in the direction of #14900, at least as of now. Happy to hear more thoughts on this.

Regardless, there's a lot of good work here, especially in the tests, which can be leveraged in any impl. we pursue for this change.

*
* @param cms the merge scheduler to unregister
*/
public void unregister(ConcurrentMergeScheduler cms) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: should we call it "deregister"?

int count = schedulers.size();
if (count == 0) return;

int share = Math.max(1, maxThreadCount / count);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be Math.min(), otherwise, we'll set some schedulers to 0 maxMergeThreads.

ConcurrentMergeScheduler[] cmsArr = new ConcurrentMergeScheduler[numSchedulers];
for (int i = 0; i < numSchedulers; i++) {
cmsArr[i] = new ConcurrentMergeScheduler();
manager.register(cmsArr[i]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already register in the CMS ctor. Do we need to register again?


/** Used in tests to read the current registered CMS set */
synchronized Set<ConcurrentMergeScheduler> getRegisteredSchedulersForTest() {
return new HashSet<>(schedulers);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe just return a 1Collections.unmodifableSet()` ? and avoid allocating a new set and copy all the entries into it..

}

/** Used in tests to clear all registered CMS instances */
synchronized void unregisterAllForTest() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this method need to be synchronized ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants