SOLR-17160: Time based tracking of core admin requests with Caffeine cache#2304
SOLR-17160: Time based tracking of core admin requests with Caffeine cache#2304dsmiley merged 16 commits intoapache:mainfrom
Conversation
solr/core/src/java/org/apache/solr/handler/admin/CoreAdminHandler.java
Outdated
Show resolved
Hide resolved
solr/core/src/java/org/apache/solr/handler/admin/CoreAdminHandler.java
Outdated
Show resolved
Hide resolved
solr/core/src/java/org/apache/solr/handler/admin/CoreAdminHandler.java
Outdated
Show resolved
Hide resolved
| requestStatusResponse.response = taskObject.getRspObject(); | ||
| requestStatusResponse.response = taskObject.getOperationRspObject(); |
There was a problem hiding this comment.
I'm confused; you are setting the response twice?
There was a problem hiding this comment.
Oh, good catch!
Something is definitely wrong here, but this is existing code. I did not notice that.
I think this deserves a a fix by itself. Values for msg and response in the V2 API response are not consistent depending on the status. And it seems to me they're not consistent with V1 either.
I think this was introduced with #2144.
Maybe I should put this PR on hold and fix this first?
There was a problem hiding this comment.
Ah; didn't know it was existing code. Do in whichever order you want; it doesn't really matter.
solr/core/src/java/org/apache/solr/handler/admin/api/GetNodeCommandStatus.java
Outdated
Show resolved
Hide resolved
| } | ||
|
|
||
| @Test | ||
| public void testTrackedRequestExpiration() throws Exception { |
solr/core/src/java/org/apache/solr/handler/admin/CoreAdminHandler.java
Outdated
Show resolved
Hide resolved
|
This PR had no visible activity in the past 60 days, labeling it as stale. Any new activity will remove the stale label. To attract more reviewers, please tag someone or notify the dev@solr.apache.org mailing list. Thank you for your contribution! |
…ler.java Co-authored-by: David Smiley <dsmiley@apache.org>
solr/core/src/java/org/apache/solr/handler/admin/CoreAdminHandler.java
Outdated
Show resolved
Hide resolved
solr/core/src/java/org/apache/solr/handler/admin/CoreAdminHandler.java
Outdated
Show resolved
Hide resolved
solr/core/src/java/org/apache/solr/handler/admin/CoreAdminHandler.java
Outdated
Show resolved
Hide resolved
| */ | ||
| private static final int MAX_TRACKED_REQUESTS = 10_000; | ||
| private static final int MAX_TRACKED_REQUESTS = | ||
| EnvUtils.getPropertyAsInteger("solr.admin.requests.running.max", 10_000); |
There was a problem hiding this comment.
As this is only for "async" requests, maybe the name should incorporate this limit like 'solr.admin.async.max". ("requests" and "running" being kind of implicit maybe)?
solr/core/src/java/org/apache/solr/handler/admin/CoreAdminHandler.java
Outdated
Show resolved
Hide resolved
|
WDYT of this proposed CHANGES.txt under Improvements:
|
|
Perhaps an additional sentence to highlight that this helps collection backup/restore and other operations scale to 100+ shards. |
|
I handled CHANGES.txt myself. I'll merge this sometime this weekend. Looking forward to seeing this one done! |
# Conflicts: # solr/CHANGES.txt
) Core Admin "async" request status tracking is no longer capped at 100; it's 10k. Statuses are now removed 5 minutes after the read of a completed/failed status. Helps collection async backup/restore and other operations scale to 100+ shards. Co-authored-by: David Smiley <dsmiley@salesforce.com> (cherry picked from commit d3b4c2e)
Thanks for handling this @dsmiley. I was off for some time and unable to do any change. |
…ache#2304) Core Admin "async" request status tracking is no longer capped at 100; it's 10k. Statuses are now removed 5 minutes after the read of a completed/failed status. Helps collection async backup/restore and other operations scale to 100+ shards. Co-authored-by: David Smiley <dsmiley@salesforce.com>
https://issues.apache.org/jira/browse/SOLR-17160
This is a simpler version of #2271, that now uses Caffeine cache instead of manually dropping old requests.
Description
Core admin completed/failed requests are tracked in-memory, but we only track a maximum of 100 requests by dropping the oldest ones. If for any reason we drop a request before the client (an other Solr node) fetched the status, we will get unexpected errors at the top level command.
Solution
Instead of the hard limit of 100 tracked requests, we track all requests in a Caffeine cache, configured to drop old requests.
This is a trade-off to not keep memory forever and to be sure we won't lose status in the case of many incoming requests.
The change adds two different timeouts:
"solr.admin.requests.running.timeout.minutes")"solr.admin.requests.completed.timeout.minutes")Tests
Added a unit test in
CoreAdminHandlerTestto check expired requests are not tracked in-memory anymore.Also checks we don't have any issue for collection backup with very high numbers of shards.
Checklist
Please review the following and check all that apply:
mainbranch../gradlew check.