Replace RingBuffer
internals with VecDeque
#78
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is a new attempt at replacing the internals of
RingBuffer
withVecDeque
.EDIT: I've also pushed the commit removing unsafe, since most of the regression is with the
VecDeque
and not with removing unsafe. It may have sense to be able to choose which implementation to use based on a feature gate.This is still WIP, unsafe comments need to be added and a lot of comments need to be added to document the many hacks we're doing to implement
extend_from_within
in a performant way despiteVecDeque
not exposing it's own internals.Compared to the previous attempts I think LLVM made many advancements, combined with
VecDeque::extend
specializations, better handling ofMaybeUninit
and the idea to usecopy_bytes_overshooting
should make the performance penalty much lower.It should also be possible, through a feature gate, to replace all use of unsafe code and
MaybeUninit
with zero initialized memory for a smaller performance penalty than going back to the originalVec
implementation from years ago.Running benchmarks on dedicated hardware on a Ryzen 5900X the performance penalty of the current draft implementation is about 8% on the builtin bench.
Using a 150MB file that decompressed to 1 GB yielded this instead: