how does this crate compare to stringzilla? #159
-
StringZilla is an impressive project that provides many interesting string primitives with SIMD acceleration. Certainly much more than this crate. The scope of StringZilla is a fair bit broader than First, in the README (commit
I agree, the With regard to reverse substring searches, I made a very intentional decision not to optimize that case with SIMD inside of Unlike StringZilla though, the As linked above, StringZilla also provides a separate repository for a targeted benchmark between StringZilla and the That's not to say that measuring searcher construction is invalid. But it should be one dimension of a good benchmark and it should absolutely be disclosed in the discussion of results. Why does the benchmark include measurement of searcher construction and no benchmarks without it? That's hard to say precisely, but one possible answer is that StringZilla actually doesn't support it! If you look at its Rust API, it doesn't provide a way to build a searcher with a needle independent of the haystack. (Its I discussed this a bit in a reddit comment as well, which includes interactions with the author of StringZilla where I bring this issue up. But from my perspective, this criticism was not well received. Now of course,
To test measurements before capturing them, run:
This should complete successfully in reasonable time. If it fails, then something has gone wrong that needs to be debugged. Otherwise, run measurements and capture the results:
Now we can rank them overall via the geometric mean of speed ratios recorded for each benchmark:
The But this result is already revealing: if one uses a prebuilt searcher from We can also look at the benchmark results in more detail:
Notice how this crate is actually quite a bit faster than StringZilla on almost every benchmark when the searcher is prebuilt. (In many of these benchmarks, prebuilding the searcher doesn't matter because the haystack is so big. But we'll compare oneshot searching next.) The main cases where StringZilla is faster are pathological. A oneshot comparison is more apples-to-apples, but like StringZilla's benchmark, it omits the speed improvements that come from prebuilding the searcher when that's possible:
Notice that in benchmarks with a large haystack and relatively low match frequency, I have not studied the source code of StringZilla in detail, but So, in summary, I think the StringZilla materials:
|
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
I answered the question in the OP. |
Beta Was this translation helpful? Give feedback.
I answered the question in the OP.