-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Concurrency issues investigation #25615
Comments
Should we do away with the query concurrency semaphore? I worry about if it unnecessarily slows things down if a query is waiting on IO (i.e. another query could be executing, but won't because of the semaphore). There's definitely a point where we'll want to park incoming requests because of maxed out resource utilization, I just doubt that a semaphore is the best primitive for this. I'd rather just remove it for now and address query throttling later. |
I think so. It looks like IOx does this in the querier by setting the limit to |
Might be relevant as I think we synced this change over from Pro, but we did enable IO on the dedicated executor because without it we had crashes and the default to not have it enabled is from IOx. This was part of their split of having separate executors. I wonder if it makes sense to have just one executor to avoid contention in Monolith at least. |
Let's remove the semaphore completely. One less thing to confound the investigation. |
@mgattozzi it would be an interesting experiment to put everything on a single pool. Maybe we should give that a try too? |
Can try to remove completely but I'm not sure it can be because it is relied on by core traits used in the Flight service. But I think setting it to |
Can we run it through a profiler to see where it's spending the time (maybe on the box that perf team are using)? |
That was incorrect, IOx doesn't set |
I opened #25627 |
@hiltontj fyi only - the concurrency issue is global across the product and just for Last value cache. The details I have observed is,
As the next step, |
Problem statement
The performance team has found that the
influxdb3
process does not handle concurrency well. In #25604 we made a change very similar to what was done in IOx to move the query planning off of the main IO thread pool, but this did not improve things and instead seems to have lead to a regression (see #25562 (comment)).This issue is for investigating why the system is breaking down under concurrent loads.
Ideas
Not configuring the main
tokio
runtimeHere we initialize the
tokio
runtime using the defaults:influxdb/influxdb3/src/main.rs
Line 116 in 0daa3f2
This means that the
tokio
runtime is initializing with as many threads as there are CPU cores, so it could be competing with the DataFusion runtime as a result.Hard-coding the query concurrency semaphore to
10
Here we hard-code the query concurrency semaphore for the:
influxdb/influxdb3/src/commands/serve.rs
Line 518 in 0daa3f2
This sets the semaphore limit here:
influxdb/influxdb3_server/src/query_executor.rs
Lines 95 to 96 in 0daa3f2
Which is supposedly used by the flight service here:
influxdb/influxdb3_server/src/query_executor.rs
Lines 353 to 358 in 0daa3f2
The text was updated successfully, but these errors were encountered: