-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Replaces 2x string.find(..) with custom one using optimized folly:memchr #12879
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
… implementation based on optimized memchr from folly. This provides performance boost to two Like subbenchmarks. Depends on Folly PR: facebook/folly#2409
✅ Deploy Preview for meta-velox canceled.
|
This is a great find (pun intended). Will pull this in after we merge the folly PR. |
while (remaining >= substr_len) | ||
{ | ||
// Use optimized memchr variant | ||
current = static_cast<const char*>(folly::__folly_memchr(current, first_char, remaining - substr_len + 1)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should never call __folly_memchr
directly. You should call memchr
and that symbol will resolve to __folly_memchr
if FOLLY_MEMCHR_IS_MEMCHR
is defined.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even if we did directly call __folly_memchr
, how would this code run on x86? The folly PR only adds an implementation for arm64.
Edit: I see, there is a fallback in the folly PR, although it may be broken (commented on the folly PR).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you very much for spotting! Will fix first thing tomorrow morning.
Is this PR still a win when using the stdlib memchr? If so, we can merge it now as long as you replace the folly memchr with std::memchr. Wondering if we should do this since the folly PR might take a while to merge. |
No, performance will stay exactly the same with stdlib memchr, all the speedup only comes from faster memory scanning for the first byte. |
@izard Understood. Let's revisit here after we figure out the situation with folly, then. If the win here is sufficiently large, we will find a way to merge it one way or another. Btw, it would be helpful if tomorrow you can share your benchmark numbers :) |
Among Velox benchmarks, the only visible difference is ~20% and ~25% uplift in Like/substring and Like/substring and Like/strpos. Everything else is just the same. |
Now using folly::memchr_long explicitly as discussed in folly PR discussion thread. |
I will work on pulling in the folly PR. Meantime, could you benchmark on x86 and verify that there is no regression? Since we only have optimized memchr_long for Arm. |
…rs type from char* to string_view
find should not always start with 0!
Just added a commit fixing a bug I introduced earlier, this bug was not affecting the tests results but would backfire in real application. |
Thank you!
Would it be possible to add a test case? Or is it hard to expose in a test? |
Edit: sorry, commented on wrong PR. Moving my comment to the Folly PR. |
@@ -19,6 +19,7 @@ | |||
#include <string> | |||
#include <string_view> | |||
#include "folly/CPortability.h" | |||
#include "folly/FollyMemchr.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how recent does folly need to be to provide the feature needed?
Cc: @kgpai @assignUser
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very recent, depends on facebook/folly#2409 which is not yet even integrated.
@@ -311,6 +312,41 @@ cappedByteLengthUnicode(const char* input, size_t size, int64_t maxChars) { | |||
return utf8Position; | |||
} | |||
|
|||
constexpr size_t long_string_find( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we add unit tests to exercise this function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do.
const char* const end = str.data() + str_len; | ||
size_t remaining = str_len - start_pos; | ||
|
||
while (remaining >= substr_len) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please also review our contributing and coding style guidelines:
https://github.com/facebookincubator/velox/blob/main/CONTRIBUTING.md
https://github.com/facebookincubator/velox/blob/main/CODING_STYLE.md
I have converted this to draft as the necessary folly PR isn't even landed. We will have to wait for it to be released and then bump versions here, which has the tendency to take longer than expected. |
This provides performance boost to two Like sub-benchmarks.
Depends on Folly PR: facebook/folly#2409