Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

3.x: parallel performs poorly with 10+ parallelism #6931

Open
akarnokd opened this issue Mar 11, 2020 · 3 comments
Open

3.x: parallel performs poorly with 10+ parallelism #6931

akarnokd opened this issue Mar 11, 2020 · 3 comments
Assignees
Milestone

Comments

@akarnokd
Copy link
Member

akarnokd commented Mar 11, 2020

For some reason, the parallel Scrabble benchmark performs poorly when the parallelism level is 10+, for example, on my i7 8700 CPU (6 cores/12 threads):

image

However, my older i7 4770K processor (4 cores/8 threads) shows no such performance degradation. Neither does the reactive-streams-commons implementation (the parent of RxJava's parallel implementation) with parallelism=12.
Correction: The Rsc benchmark was pinned to 8 threads and actually shows a similar inefficiency with 10+.

@akarnokd akarnokd added this to the 3.1 milestone Mar 11, 2020
@akarnokd akarnokd self-assigned this Mar 11, 2020
@akarnokd
Copy link
Member Author

I did a different implementation but the degradation isn't gone, just reduced:

image

With the new code organization, the performance is slightly worse at P=1 and P=6 and somewhat better at higher Ps. The others are likely within the noise limit.

image

I'm starting to think the underlying issue is that one thread simply can't drive that many rails that fast, thus the round-robin dispatching will result in a high volume of scheduling activity (also hinted by Java Flight Recorder).

@akarnokd
Copy link
Member Author

If I implement batch-dispatching, the the scheduling overhead appears to be mostly eliminated:

image

@akarnokd akarnokd changed the title 3.x: parallel and/or p-reduce performs poorly with 10+ parallelism 3.x: parallel performs poorly with 10+ parallelism Mar 11, 2020
@muralik09
Copy link

you have consider lot of aspects while making parallel calls.

one request want to make 10 parallel calls means and your server supports only 12 threads, what about the second request, it will wait releasing of threads from first request.

you have check back all the 12 threads are allocated to your program.

etc...

@akarnokd akarnokd modified the milestones: 3.1, 3.1-support Aug 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants