-
-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ByteSlice functions to skip past or until a set of bytes. #13
Comments
I think adding these is a good idea. I did consider doing this initially, but decided to just punt on it because it was extra work. But I do think they belong. In particular, it's not only Looking at your functions, I think they mostly look good, although I'd probably modify their API to return indices instead of slices, which is more consistent with the APIs in We probably also want to add a Again, a PR for this would be welcome, but it's likely something I'll do eventually otherwise. The implementation you have looks fine for a start. The next steps after that (if you're interested) would be to add benchmarks for them and optimize them with PCMPESTRI. |
Sure, I'll try to do this after work.
Hm, okay, that is more general (I imagine the output would frequently get sliced right away, though). In that case, maybe the functions should be named similarly to the That is, maybe they'd be named I'll start with these 4 as they're the closes to what I already wrote, and are the simplest as well. (It's also unclear to me how to continue that naming scheme to others where it might apply, which might mean it's not a good name...)
Ah right, I forgot there were some pretty crazy instructions in the later SSEs... |
Hmmm. Those names don't seem too bad to me. I also don't mind the C++ flavored names, Maybe including
Yeah, |
Yeah, I was going to suggest
Sadly |
I still kind of like I do however like |
Yeah, and then those could just be a suffix you'd use on any function that gets added that takes a set of bytes in that way, similar to how But I'm also willing to go with find_set_bytes (not sure how to negate it) if you'd prefer. It's not that weird, just not naturally how I'd think about it. |
SGTM. (For |
It's common to want to skip past/until a specific set of bytes. C++'s std::string::find_first_of/find_first_not_of are an example.
The current API has trim_start_with and trim_end_with which can replaces some uses of this, but require unicode (#12) and often be much slower than an implementation that leverages memchr when the set of bytes is small.
I had some helpers for this, and ended up not being disappointed that I couldn't really replace them (I could replace some uses of them with
trim_start_with
, but possibly slower, due to both extra UTF-8 decoding, as well as not being able to use memchr in theuntil
case (and both could be accelerated in the same manner, of course).My implementations are here (
skip_until
/skip_while
) https://gist.github.com/thomcc/a39c9bf5c7c50b0db1e5f1d4f92429a7 in case that's interesting or it's unclear what I mean.Additionally, while I wouldn't have used them, presumably versions starting from the right, and versions along the lines of
fields
would be helpful. (Actually, had afields
version of this existed, it would have replaced some of my uses of these functions, probably)That said, this is getting to be a lot of functions 🙁 -- obviously this could be done with some pattern-esque API, you mentioned not being interested in a design like that, though (which I completely respect, and think makes the docs much clearer). I don't think this case is that niche, but it might be too niche given how many functions it would have.
Anyway, I don't think this is that niche, but I'd understand a desire not to increase the number of functions too far.
The text was updated successfully, but these errors were encountered: