-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Avoid allocations in IndexFileNames.parseGeneration method #14920
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop receiving this reminder on future updates to the PR. |
Your profiler is lying to you. Toss it and get a better one :) |
int third = (second != -1) ? filename.indexOf('_', second + 1) : -1; | ||
|
||
int parts = (second == -1 || second >= end) ? 2 : (third == -1 || third >= end) ? 3 : 4; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a massive increase in complexity of the code: the tradeoff isn't worth it.
It's a simple string method called once, doesn't even approach the cost of the system call to actually read the file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In our scenario, index updates (and we have ~6 different indexes) can happen every 5 seconds. It's profiling info from a 5-minute production run with a async-profiler (one of the best at the moment, imho). So, if these allocations were captured, they're not rare. Agree about complexity. But this method now should be also 5-10x faster if that matters for you. If you want, I can provide you with a microbenchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, this doesn't look like the right tradeoff to me. I'd recommend using a better profiler.
I don't see IndexFileNames anywhere in the CPU nor heap profiling of our benchmarks:
IndexFileNames.parseGeneration
allocates multiple unnecessary strings (3-6 strings per call) + string[] objects:With often reindexing this allocations become notable during profiling. These allocations could be easily eliminated with a manual underscore search. After the fix only 1 string will be allocated in the worst case.