-
Notifications
You must be signed in to change notification settings - Fork 507
[WIP] sstable: add adaptive compression #4497
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
[WIP] sstable: add adaptive compression #4497
Conversation
7d08f03
to
c86227b
Compare
Uncompressible Workload StatsAdaptive with 80% HeuristicBenchmarkycsb/A/values=1000-10000 16552 27.6 ops/sec 85997057619 read 223965901255 write 8.32 r-amp 1.63 w-amp total size of tables: 141MBAdaptive with 50% HeuristicBenchmarkycsb/A/values=1000-10000 16439 27.4 ops/sec 86044986132 read 224066957272 write 8.25 r-amp 1.63 w-amp total size of tables: 156MBZstdBenchmarkycsb/A/values=1000-10000 16931 28.2 ops/sec 81646976354 read 224230627760 write 8.47 r-amp 1.57 w-amp total size of tables: 138MBSnappyBenchmarkycsb/A/values=1000-10000 16573 27.6 ops/sec 86924167042 read 224030946919 write 7.92 r-amp 1.64 w-amp total size of tables: 150MB50% compressibility Workload StatsAdaptive with 80% heuristicBenchmarkycsb/A/values=1000-10000/2 17295 28.8 ops/sec 80832314381 read 224344394838 write 6.43 r-amp 1.56 w-amp size: 103MBAdaptive with 50% heuristicBenchmarkycsb/A/values=1000-10000/2 17077 28.5 ops/sec 81497578031 read 224367456236 write 6.01 r-amp 1.57 w-amp size: 73MBZstdBenchmarkycsb/A/values=1000-10000/2 19529 32.5 ops/sec 63015426484 read 224549902202 write 6.06 r-amp 1.39 w-amp 44MBSnappyBenchmarkycsb/A/values=1000-10000/2 16977 28.3 ops/sec 82733126436 read 224341226716 write 6.32 r-amp 1.59 w-amp size: 102MB90% compressibility Workload StatsAdaptive with 80% heuristicBenchmarkycsb/A/values=1000-10000/10.0 23742 39.6 ops/sec 27523974552 read 225339873543 write 3.38 r-amp 1.14 w-amp size: 7.3MBAdaptive with 50% heuristicBenchmarkycsb/A/values=1000-10000/10.0 23527 39.2 ops/sec 27682693967 read 225420934838 write 3.40 r-amp 1.14 w-amp size: 7.5MBZstdBenchmarkycsb/A/values=1000-10000/10 23680 39.5 ops/sec 27894399045 read 225267953317 write 3.30 r-amp 1.14 w-amp size: 7.7MBSnappyBenchmarkycsb/A/values=1000-10000/10 22679 37.8 ops/sec 36271308951 read 225194025507 write 2.96 r-amp 1.19 w-amp size: 9.2MB |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of curiosity, are you specifying target compression numbers for these benchmarks? I think we'd want to have benchmarks with values which compress well with both Zstd and Snappy, values which compress well only with Zstd and not Snappy, and values that don't compress well at all.
The adaptive heuristic you're using seems to be prioritizing Zstd compression even if that compression is not significantly better than Snappy. This might be a pessimization if Zstd is consuming more CPU than Snappy. Concretely, I can imagine data that gets better than 50% compression with both Zstd and Snappy (i.e. highly compressible data), but uses less CPU to compress using Snappy.
Reviewable status: 0 of 6 files reviewed, all discussions resolved
c86227b
to
adcbb03
Compare
Add adaptive compression with zstd and snappy. zstd copmression is probed with exponential backoff when compression ratios are not better than 50%.
adcbb03
to
ec2ecd9
Compare
Add adaptive compression with zstd and snappy. zstd copmression is probed with exponential backoff when compression ratios are not better than 50%.