-
-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement functions for searching bytesets #15
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh this is just lovely! Thank you so much for doing this. I left a number of comments, and I think all of them are very small changes other than a request to add more test coverage for inv_memchr1
.
fn qc_byteset_backwards_not_matches_naive(haystack: Vec<u8>, needles: Vec<u8>) -> bool { | ||
super::rfind_not(&haystack, &needles) == haystack.iter().rposition(|b| !needles.contains(b)) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I kind of feel like this might not be enough test coverage. Most of the code here is simple enough that I'd probably be okay with it, but the inv_memchr1
function is not as simple and probably deserves more testing. A particularly important part of testing code like inv_memchr1
is to make sure all offsets are tested, which is what the memchr crate does: https://github.com/BurntSushi/rust-memchr/blob/665fffcc9506ac1cd93339470d2ca4b018e648c6/src/tests/mod.rs#L353-L374
I realize this is perhaps more work than you wanted to take on. I would totally be okay with leaving the extra tests out of this PR, but I'd probably insist on dropping inv_memchr1
in addition to that. (You could just throw it up into another PR without tests, and I'd be happy to add tests to it later when I get a chance. It's up to you and your time/desire budget. :-))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding more tests doesn't bother me, I was just being lazy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added what I think amounts to equivalent tests to what you have for memchr 1 byte, although it's possible it's still not ideal, let me know.
(I'll fix the unused import after work, but I'm not sure the nightly bustage is related to me, seems to be some rustfmt change?) |
Yeah, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great thanks! For CI, the rustfmt stuff seems unfortunate. Could you try formatting with cargo +nightly fmt
?
Also, there are some long lines in this PR that I had thought rustfmt
would take care of automatically. Are you perhaps not using the rustfmt.toml
in this repo somehow?
Otherwise, things look great!
} | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome, thank you. :-)
|
I might be missing something here. Lines 18 to 22 in 11ceee9
I've only recently started using |
Oh. My bad, in the future I'll only run it against nightly rustfmt. I had incorrectly assumed it had been running against all toolchains, so pushing up a nightly fix would only regress stable/beta. Sorry about that. |
No worries. Now I'm wondering whether CI should be using nightly or stable rustfmt... 🤔 But that's for another time. |
Stable is easier on contributors, and less prone to failures on unrelated patches, but it's not a big deal either way. Do you have anything else you'd like me to do here? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope, looks great! Thanks for your patience. :-) I'll consider moving to stable rustfmt.
This PR is in |
Sorry this took so long. I didn't end up doing any SIMD-optimized versions, but I did optimize the single-byte version of the find_not predicate using SWAR-ish operations taken more or less from
memchr
with a few modifications to make the conditions inverted.It was not clear to me how to implement the 2-byte and 3-byte versions of the find_not functions in the same manner (although I'm probably just not being sufficiently clever).
It includes benchmarks, but not terribly robust ones.
Fixes #13