Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace RingBuffer internals with VecDeque #78

Draft
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

paolobarbolini
Copy link
Contributor

@paolobarbolini paolobarbolini commented Nov 29, 2024

This is a new attempt at replacing the internals of RingBuffer with VecDeque.

EDIT: I've also pushed the commit removing unsafe, since most of the regression is with the VecDeque and not with removing unsafe. It may have sense to be able to choose which implementation to use based on a feature gate.

This is still WIP, unsafe comments need to be added and a lot of comments need to be added to document the many hacks we're doing to implement extend_from_within in a performant way despite VecDeque not exposing it's own internals.

Compared to the previous attempts I think LLVM made many advancements, combined with VecDeque::extend specializations, better handling of MaybeUninit and the idea to use copy_bytes_overshooting should make the performance penalty much lower.

It should also be possible, through a feature gate, to replace all use of unsafe code and MaybeUninit with zero initialized memory for a smaller performance penalty than going back to the original Vec implementation from years ago.

Running benchmarks on dedicated hardware on a Ryzen 5900X the performance penalty of the current draft implementation is about 8% on the builtin bench.

Using a 150MB file that decompressed to 1 GB yielded this instead:

# master

decode_all_slice        time:   [1.7168 s 1.7227 s 1.7289 s]

# this branch

decode_all_slice        time:   [2.0467 s 2.0557 s 2.0644 s]
                        change: [+18.628% +19.329% +20.029%] (p = 0.00 < 0.05)
                        Performance has regressed.

# this branch with all unsafe removed

decode_all_slice        time:   [2.1711 s 2.1795 s 2.1873 s]
                        change: [+5.3898% +6.0207% +6.6085%] (p = 0.00 < 0.05)
                        Performance has regressed.

@paolobarbolini paolobarbolini force-pushed the vecdeque-as-ringbuffer branch 2 times, most recently from 4d75327 to 7a9d4bc Compare November 30, 2024 11:11
@KillingSpark
Copy link
Owner

Thanks for trying this out, it's important to check that having our own ringbuffer is still worth it.

A 20% performance hit would justify keeping it for the moment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants