Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SOLR-13350: multi-threaded search: replace cached with fixed threadpool #2508

Merged

Conversation

cpoerschke
Copy link
Contributor

Comment on lines 446 to 450
this.collectorExecutor =
ExecutorUtil.newMDCAwareCachedThreadPool(
ExecutorUtil.newMDCAwareFixedThreadPool(
cfg.getIndexSearcherExecutorThreads(), // thread count
cfg.getIndexSearcherExecutorThreads() * 1000, // queue size
new SolrNamedThreadFactory("searcherCollector"));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/util/concurrent/ThreadPoolExecutor.html documentation says "If corePoolSize or more threads are running, the Executor always prefers queuing a request rather than adding a new thread." and "If a request cannot be queued, a new thread is created unless this would exceed maximumPoolSize, in which case, the task will be rejected."

I think this means that if the queue size is generous (relative to the use of the thread pool) then queuing will always be possible and search will remain single-threaded.

@cpoerschke cpoerschke marked this pull request as ready for review June 7, 2024 16:58
@dsmiley
Copy link
Contributor

dsmiley commented Jun 18, 2024

Can you recommend a way to benchmark this myself, like if I wanted to tweak it to see how my ideas work out?

@cpoerschke
Copy link
Contributor Author

Can you recommend a way to benchmark this myself, like if I wanted to tweak it to see how my ideas work out?

I've shared the "just pass an executor" patch onto a Solr 9.5 cloud that I used, in case that's useful.

The benchmarking experiments I ran used Python's threading module and (I think under the hood) pysolr also, measuring QTime/elapsed time/CPU utilisation, for example. My experiments were for dense vector search, I haven't yet looked into if/how other searches might use the passed in executor at the Lucene level. But it would be worth mentioning at this point that the collection was quite multi-sharded i.e. that being independent of the type of search being run.

@gus-asf
Copy link
Contributor

gus-asf commented Jun 26, 2024

Along with this maybe we want to change

 public static final int DEFAULT_INDEX_SEARCHER_EXECUTOR_THREADS = 4;

to

public static final int DEFAULT_INDEX_SEARCHER_EXECUTOR_THREADS = Runtime.getRuntime().availableProcessors();

@cpoerschke
Copy link
Contributor Author

Along with this maybe we want to change

 public static final int DEFAULT_INDEX_SEARCHER_EXECUTOR_THREADS = 4;

to

public static final int DEFAULT_INDEX_SEARCHER_EXECUTOR_THREADS = Runtime.getRuntime().availableProcessors();

Opened #2569 to separate that less controversial (?) change out from the less obvious (?) change here. Also there's some associated doc and config changes needed to go with making the default runtime aware.

@cpoerschke cpoerschke merged commit dfdbf85 into apache:main Jul 23, 2024
4 checks passed
@cpoerschke cpoerschke deleted the SOLR-13350-fixed-not-cached-threadpool branch July 23, 2024 16:42
asfgit pushed a commit that referenced this pull request Jul 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants