Skip to content

Replaces 2x string.find(..) with custom one using optimized folly:memchr #12879

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

izard
Copy link

@izard izard commented Apr 1, 2025

This provides performance boost to two Like sub-benchmarks.
Depends on Folly PR: facebook/folly#2409

… implementation based on optimized memchr from folly. This provides performance boost to two Like subbenchmarks.

Depends on Folly PR: facebook/folly#2409
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 1, 2025
Copy link

netlify bot commented Apr 1, 2025

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit d224e70
🔍 Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/68107c4c3e738c0008ca3d73

@embg
Copy link

embg commented Apr 1, 2025

This is a great find (pun intended). Will pull this in after we merge the folly PR.

while (remaining >= substr_len)
{
// Use optimized memchr variant
current = static_cast<const char*>(folly::__folly_memchr(current, first_char, remaining - substr_len + 1));
Copy link

@embg embg Apr 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should never call __folly_memchr directly. You should call memchr and that symbol will resolve to __folly_memchr if FOLLY_MEMCHR_IS_MEMCHR is defined.

Copy link

@embg embg Apr 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if we did directly call __folly_memchr, how would this code run on x86? The folly PR only adds an implementation for arm64.

Edit: I see, there is a fallback in the folly PR, although it may be broken (commented on the folly PR).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much for spotting! Will fix first thing tomorrow morning.

@embg
Copy link

embg commented Apr 1, 2025

Is this PR still a win when using the stdlib memchr? If so, we can merge it now as long as you replace the folly memchr with std::memchr. Wondering if we should do this since the folly PR might take a while to merge.

@izard
Copy link
Author

izard commented Apr 1, 2025

Is this PR still a win when using the stdlib memchr?

No, performance will stay exactly the same with stdlib memchr, all the speedup only comes from faster memory scanning for the first byte.

@embg
Copy link

embg commented Apr 1, 2025

@izard Understood. Let's revisit here after we figure out the situation with folly, then. If the win here is sufficiently large, we will find a way to merge it one way or another.

Btw, it would be helpful if tomorrow you can share your benchmark numbers :)

@izard
Copy link
Author

izard commented Apr 2, 2025

@izard Understood. Let's revisit here after we figure out the situation with folly, then. If the win here is sufficiently large, we will find a way to merge it one way or another.
Btw, it would be helpful if tomorrow you can share your benchmark numbers :)

Among Velox benchmarks, the only visible difference is ~20% and ~25% uplift in Like/substring and Like/substring and Like/strpos. Everything else is just the same.

@izard
Copy link
Author

izard commented Apr 2, 2025

Now using folly::memchr_long explicitly as discussed in folly PR discussion thread.

@embg
Copy link

embg commented Apr 2, 2025

I will work on pulling in the folly PR. Meantime, could you benchmark on x86 and verify that there is no regression? Since we only have optimized memchr_long for Arm.

@izard
Copy link
Author

izard commented Apr 8, 2025

Just added a commit fixing a bug I introduced earlier, this bug was not affecting the tests results but would backfire in real application.

@embg
Copy link

embg commented Apr 8, 2025

@izard

Just added a commit fixing a bug I introduced earlier

Thank you!

this bug was not affecting the tests results but would backfire in real application.

Would it be possible to add a test case? Or is it hard to expose in a test?

@embg
Copy link

embg commented Apr 8, 2025

Edit: sorry, commented on wrong PR. Moving my comment to the Folly PR.

@@ -19,6 +19,7 @@
#include <string>
#include <string_view>
#include "folly/CPortability.h"
#include "folly/FollyMemchr.h"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how recent does folly need to be to provide the feature needed?

Cc: @kgpai @assignUser

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very recent, depends on facebook/folly#2409 which is not yet even integrated.

@@ -311,6 +312,41 @@ cappedByteLengthUnicode(const char* input, size_t size, int64_t maxChars) {
return utf8Position;
}

constexpr size_t long_string_find(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add unit tests to exercise this function?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do.

const char* const end = str.data() + str_len;
size_t remaining = str_len - start_pos;

while (remaining >= substr_len)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@assignUser assignUser marked this pull request as draft May 7, 2025 15:38
@assignUser
Copy link
Collaborator

assignUser commented May 7, 2025

I have converted this to draft as the necessary folly PR isn't even landed. We will have to wait for it to be released and then bump versions here, which has the tendency to take longer than expected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants