Replace bit shift with __builtin_ctzll in HyperLogLog #13218
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Replace bit shift with
__builtin_ctzll
in HyperLogLogBuiltin function
__builtin_ctzll
is more effective than bit shift even though "in the average case there are high probabilities to find a 1 after a few iterations" mentioned in the source file comment.I wrote a program to test whether it's more effective. Let me try to explain the test cases and the result.
Here is the source code.
There are 4
define
cases in the program:RANDOM
: just generate random uint64_t. This is a base time cost when the next two cases is run.BITSHIFT
: counting the trailing zeros of the random numbers with bit shift method.BUILTIN
: counting the trailing zeros of the random numbers with builtin__builtin_ctzll
CHECK
: call two functions and compare their results; print out the distribution of tailing zeros length.More explainations:
ret
storing the sum of trailing zeros length, is use to void skipping the process when-O2
flag is used.CHECK
case can cover more long trailing zeros numbers, I left-shift the random number:num = (num << (n % 50)) | ((uint64_t)1 << 51);
Now let me show the result:
1. Run first 3 cases and compare the time
The result is as following:
After removing the random number generating costs, we got this(much faster):
Meanwhile the
ret
of two cases is the same on. This means the correction of the new method.2. Run check case
As mentioned before, I left shifted the number. The result of two different counting method for each random number is same. And the distribution of trailing zeros length is as following:
3. Conclusion
The builtin function
__builtin_ctzll
is correct in our case and much more effective than raw bit shift.A replacement will bring a significant help.