Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement functions for searching bytesets #15

Merged
merged 3 commits into from
Jul 21, 2019

Conversation

thomcc
Copy link
Contributor

@thomcc thomcc commented Jul 18, 2019

Sorry this took so long. I didn't end up doing any SIMD-optimized versions, but I did optimize the single-byte version of the find_not predicate using SWAR-ish operations taken more or less from memchr with a few modifications to make the conditions inverted.

It was not clear to me how to implement the 2-byte and 3-byte versions of the find_not functions in the same manner (although I'm probably just not being sufficiently clever).

It includes benchmarks, but not terribly robust ones.

Fixes #13

Copy link
Owner

@BurntSushi BurntSushi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh this is just lovely! Thank you so much for doing this. I left a number of comments, and I think all of them are very small changes other than a request to add more test coverage for inv_memchr1.

src/byteset/mod.rs Outdated Show resolved Hide resolved
src/byteset/scalar.rs Outdated Show resolved Hide resolved
src/byteset/scalar.rs Outdated Show resolved Hide resolved
src/byteset/scalar.rs Outdated Show resolved Hide resolved
src/byteset/scalar.rs Show resolved Hide resolved
fn qc_byteset_backwards_not_matches_naive(haystack: Vec<u8>, needles: Vec<u8>) -> bool {
super::rfind_not(&haystack, &needles) == haystack.iter().rposition(|b| !needles.contains(b))
}
}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kind of feel like this might not be enough test coverage. Most of the code here is simple enough that I'd probably be okay with it, but the inv_memchr1 function is not as simple and probably deserves more testing. A particularly important part of testing code like inv_memchr1 is to make sure all offsets are tested, which is what the memchr crate does: https://github.com/BurntSushi/rust-memchr/blob/665fffcc9506ac1cd93339470d2ca4b018e648c6/src/tests/mod.rs#L353-L374

I realize this is perhaps more work than you wanted to take on. I would totally be okay with leaving the extra tests out of this PR, but I'd probably insist on dropping inv_memchr1 in addition to that. (You could just throw it up into another PR without tests, and I'd be happy to add tests to it later when I get a chance. It's up to you and your time/desire budget. :-))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding more tests doesn't bother me, I was just being lazy.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added what I think amounts to equivalent tests to what you have for memchr 1 byte, although it's possible it's still not ideal, let me know.

src/ext_slice.rs Outdated Show resolved Hide resolved
src/ext_slice.rs Outdated Show resolved Hide resolved
src/ext_slice.rs Outdated Show resolved Hide resolved
@thomcc
Copy link
Contributor Author

thomcc commented Jul 18, 2019

(I'll fix the unused import after work, but I'm not sure the nightly bustage is related to me, seems to be some rustfmt change?)

@thomcc
Copy link
Contributor Author

thomcc commented Jul 19, 2019

Yeah, cargo +nightly fmt and cargo fmt seem to disagree about the changes that need to be made.

Copy link
Owner

@BurntSushi BurntSushi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great thanks! For CI, the rustfmt stuff seems unfortunate. Could you try formatting with cargo +nightly fmt?

Also, there are some long lines in this PR that I had thought rustfmt would take care of automatically. Are you perhaps not using the rustfmt.toml in this repo somehow?

Otherwise, things look great!

src/byteset/mod.rs Outdated Show resolved Hide resolved
}
}
}
}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, thank you. :-)

@thomcc
Copy link
Contributor Author

thomcc commented Jul 20, 2019

cargo +nightly fmt does make other changes (to functions inside src/utf8.rs and src/ascii.rs), but after making them, cargo fmt --check (off of nightly) fails. I don't think generally it's good to run rustfmt on multiple toolchains in CI.

@BurntSushi
Copy link
Owner

I might be missing something here. rustfmt should only be run against nightly in CI?

bstr/ci/script.sh

Lines 18 to 22 in 11ceee9

if [ "$TRAVIS_RUST_VERSION" = "nightly" ]; then
rustup component add rustfmt --toolchain nightly-x86_64-unknown-linux-gnu
cargo fmt -- --check
cargo bench --verbose --manifest-path bench/Cargo.toml -- --test
fi

I've only recently started using rustfmt, so I'm still working out the kinks. In any case, I just pushed a run of cargo +nightly fmt to master, so if you rebase, you should get those changes. CI should pass then. The master build just passed: https://travis-ci.org/BurntSushi/bstr/builds/561495045

@thomcc
Copy link
Contributor Author

thomcc commented Jul 20, 2019

rustfmt should only be run against nightly in CI?

Oh. My bad, in the future I'll only run it against nightly rustfmt. I had incorrectly assumed it had been running against all toolchains, so pushing up a nightly fix would only regress stable/beta.

Sorry about that.

@BurntSushi
Copy link
Owner

No worries. Now I'm wondering whether CI should be using nightly or stable rustfmt... 🤔 But that's for another time.

@thomcc
Copy link
Contributor Author

thomcc commented Jul 20, 2019

Stable is easier on contributors, and less prone to failures on unrelated patches, but it's not a big deal either way.

Do you have anything else you'd like me to do here?

Copy link
Owner

@BurntSushi BurntSushi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, looks great! Thanks for your patience. :-) I'll consider moving to stable rustfmt.

@BurntSushi BurntSushi merged commit 110222d into BurntSushi:master Jul 21, 2019
@BurntSushi
Copy link
Owner

This PR is in bstr 0.2.3 on crates.io! (Along with an update to Unicode 12.1.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ByteSlice functions to skip past or until a set of bytes.
2 participants