New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ML] Refactoring http settings and adding stats endpoint #108219
base: main
Are you sure you want to change the base?
[ML] Refactoring http settings and adding stats endpoint #108219
Conversation
"xpack.inference.http.max_connections", | ||
public static final Setting<Integer> MAX_TOTAL_CONNECTIONS = Setting.intSetting( | ||
"xpack.inference.http.max_total_connections", | ||
50, // default |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Increasing this to 50 by default to exceed the maximum for a single to avoid a single service exhausting the entire connection pool.
* The max number of connections a single route can lease. | ||
*/ | ||
public static final Setting<Integer> MAX_ROUTE_CONNECTIONS = Setting.intSetting( | ||
"xpack.inference.http.max_route_connections", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Allowing the per route limit to be controlled separately. I don't expect users to modify these settings often if ever. I could see these being used if we run into a situation in serverless or an SDH where we can help increase throughput.
private IdleConnectionEvictor connectionEvictor; | ||
private final HttpClient httpClient; | ||
|
||
private volatile TimeValue evictionInterval; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The common pattern is to use volatile
for settings, so moving to that.
@elasticmachine merge upstream |
…han-buttner/elasticsearch into ml-inference-http-pooling-settings
…ence-http-pooling-settings
Pinging @elastic/ml-core (Team:ML) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
This PR adds some comments around the connection eviction and keep-alive strategy documentation for the apache http client implementation. I also increased the connection pool limit to exceed a single route's maximum. I've run into a situation during testing where a single service can lease all the connections in the pool and effectively blocking any other services from leasing a connection.
I also created a stats endpoint at
/_inference_stats
. I was hoping to use/_inference/_stats
but theoretically a user could have that as their inference endpoint id 🤷♂️Right now the endpoint returns stats for the internal apache connection pool. I think this could be useful because it could illuminate whether a cluster has reached the maximum connections in the pool.
Example response: