-
Notifications
You must be signed in to change notification settings - Fork 690
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SOLR-17160: Time based tracking of core admin requests with Caffeine cache #2304
Conversation
solr/core/src/java/org/apache/solr/handler/admin/CoreAdminHandler.java
Outdated
Show resolved
Hide resolved
solr/core/src/java/org/apache/solr/handler/admin/CoreAdminHandler.java
Outdated
Show resolved
Hide resolved
solr/core/src/java/org/apache/solr/handler/admin/CoreAdminHandler.java
Outdated
Show resolved
Hide resolved
requestStatusResponse.response = taskObject.getRspObject(); | ||
requestStatusResponse.response = taskObject.getOperationRspObject(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused; you are setting the response twice?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, good catch!
Something is definitely wrong here, but this is existing code. I did not notice that.
I think this deserves a a fix by itself. Values for msg
and response
in the V2 API response are not consistent depending on the status. And it seems to me they're not consistent with V1 either.
I think this was introduced with #2144.
Maybe I should put this PR on hold and fix this first?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah; didn't know it was existing code. Do in whichever order you want; it doesn't really matter.
solr/core/src/java/org/apache/solr/handler/admin/api/GetNodeCommandStatus.java
Outdated
Show resolved
Hide resolved
@@ -528,4 +534,39 @@ public void testNonexistentCoreReload() throws Exception { | |||
e.getMessage()); | |||
admin.close(); | |||
} | |||
|
|||
@Test | |||
public void testTrackedRequestExpiration() throws Exception { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wonderful test
solr/core/src/java/org/apache/solr/handler/admin/CoreAdminHandler.java
Outdated
Show resolved
Hide resolved
This PR had no visible activity in the past 60 days, labeling it as stale. Any new activity will remove the stale label. To attract more reviewers, please tag someone or notify the [email protected] mailing list. Thank you for your contribution! |
solr/core/src/java/org/apache/solr/handler/admin/CoreAdminHandler.java
Outdated
Show resolved
Hide resolved
solr/core/src/java/org/apache/solr/handler/admin/CoreAdminHandler.java
Outdated
Show resolved
Hide resolved
solr/core/src/java/org/apache/solr/handler/admin/CoreAdminHandler.java
Outdated
Show resolved
Hide resolved
@@ -436,7 +436,8 @@ public static class CoreAdminAsyncTracker { | |||
* we're not supposed to hit it. This is just a protection to grow in memory too much when | |||
* receiving an abusive number of admin requests. | |||
*/ | |||
private static final int MAX_TRACKED_REQUESTS = 10_000; | |||
private static final int MAX_TRACKED_REQUESTS = | |||
EnvUtils.getPropertyAsInteger("solr.admin.requests.running.max", 10_000); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As this is only for "async" requests, maybe the name should incorporate this limit like 'solr.admin.async.max". ("requests" and "running" being kind of implicit maybe)?
solr/core/src/java/org/apache/solr/handler/admin/CoreAdminHandler.java
Outdated
Show resolved
Hide resolved
WDYT of this proposed CHANGES.txt under Improvements:
|
Perhaps an additional sentence to highlight that this helps collection backup/restore and other operations scale to 100+ shards. |
I handled CHANGES.txt myself. I'll merge this sometime this weekend. Looking forward to seeing this one done! |
# Conflicts: # solr/CHANGES.txt
) Core Admin "async" request status tracking is no longer capped at 100; it's 10k. Statuses are now removed 5 minutes after the read of a completed/failed status. Helps collection async backup/restore and other operations scale to 100+ shards. Co-authored-by: David Smiley <[email protected]> (cherry picked from commit d3b4c2e)
Thanks for handling this @dsmiley. I was off for some time and unable to do any change. |
https://issues.apache.org/jira/browse/SOLR-17160
This is a simpler version of #2271, that now uses Caffeine cache instead of manually dropping old requests.
Description
Core admin completed/failed requests are tracked in-memory, but we only track a maximum of 100 requests by dropping the oldest ones. If for any reason we drop a request before the client (an other Solr node) fetched the status, we will get unexpected errors at the top level command.
Solution
Instead of the hard limit of 100 tracked requests, we track all requests in a Caffeine cache, configured to drop old requests.
This is a trade-off to not keep memory forever and to be sure we won't lose status in the case of many incoming requests.
The change adds two different timeouts:
"solr.admin.requests.running.timeout.minutes"
)"solr.admin.requests.completed.timeout.minutes"
)Tests
Added a unit test in
CoreAdminHandlerTest
to check expired requests are not tracked in-memory anymore.Also checks we don't have any issue for collection backup with very high numbers of shards.
Checklist
Please review the following and check all that apply:
main
branch../gradlew check
.