Skip to content

[WIP] sstable: add adaptive compression #4497

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

EdwardX29
Copy link
Contributor

Add adaptive compression with zstd and snappy. zstd copmression is probed with exponential backoff when compression ratios are not better than 50%.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@EdwardX29 EdwardX29 force-pushed the edwardx29/add-adaptive-compressor branch 6 times, most recently from 7d08f03 to c86227b Compare April 9, 2025 03:36
@EdwardX29
Copy link
Contributor Author

EdwardX29 commented Apr 9, 2025

Uncompressible Workload Stats

Adaptive with 80% HeuristicBenchmarkycsb/A/values=1000-10000 16552 27.6 ops/sec 85997057619 read 223965901255 write 8.32 r-amp 1.63 w-amp total size of tables: 141MB
Adaptive with 50% Heuristic Benchmarkycsb/A/values=1000-10000 16439 27.4 ops/sec 86044986132 read 224066957272 write 8.25 r-amp 1.63 w-amp total size of tables: 156MB
Zstd Benchmarkycsb/A/values=1000-10000 16931 28.2 ops/sec 81646976354 read 224230627760 write 8.47 r-amp 1.57 w-amp total size of tables: 138MB
Snappy Benchmarkycsb/A/values=1000-10000 16573 27.6 ops/sec 86924167042 read 224030946919 write 7.92 r-amp 1.64 w-amp total size of tables: 150MB

50% compressibility Workload Stats

Adaptive with 80% heuristic Benchmarkycsb/A/values=1000-10000/2 17295 28.8 ops/sec 80832314381 read 224344394838 write 6.43 r-amp 1.56 w-amp size: 103MB
Adaptive with 50% heuristic Benchmarkycsb/A/values=1000-10000/2 17077 28.5 ops/sec 81497578031 read 224367456236 write 6.01 r-amp 1.57 w-amp size: 73MB
Zstd Benchmarkycsb/A/values=1000-10000/2 19529 32.5 ops/sec 63015426484 read 224549902202 write 6.06 r-amp 1.39 w-amp 44MB
Snappy Benchmarkycsb/A/values=1000-10000/2 16977 28.3 ops/sec 82733126436 read 224341226716 write 6.32 r-amp 1.59 w-amp size: 102MB

90% compressibility Workload Stats

Adaptive with 80% heuristic Benchmarkycsb/A/values=1000-10000/10.0 23742 39.6 ops/sec 27523974552 read 225339873543 write 3.38 r-amp 1.14 w-amp size: 7.3MB
Adaptive with 50% heuristic Benchmarkycsb/A/values=1000-10000/10.0 23527 39.2 ops/sec 27682693967 read 225420934838 write 3.40 r-amp 1.14 w-amp size: 7.5MB
Zstd Benchmarkycsb/A/values=1000-10000/10 23680 39.5 ops/sec 27894399045 read 225267953317 write 3.30 r-amp 1.14 w-amp size: 7.7MB
Snappy Benchmarkycsb/A/values=1000-10000/10 22679 37.8 ops/sec 36271308951 read 225194025507 write 2.96 r-amp 1.19 w-amp size: 9.2MB

Copy link
Collaborator

@petermattis petermattis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, are you specifying target compression numbers for these benchmarks? I think we'd want to have benchmarks with values which compress well with both Zstd and Snappy, values which compress well only with Zstd and not Snappy, and values that don't compress well at all.

The adaptive heuristic you're using seems to be prioritizing Zstd compression even if that compression is not significantly better than Snappy. This might be a pessimization if Zstd is consuming more CPU than Snappy. Concretely, I can imagine data that gets better than 50% compression with both Zstd and Snappy (i.e. highly compressible data), but uses less CPU to compress using Snappy.

Reviewable status: 0 of 6 files reviewed, all discussions resolved

@EdwardX29 EdwardX29 force-pushed the edwardx29/add-adaptive-compressor branch from c86227b to adcbb03 Compare April 13, 2025 16:01
@EdwardX29
Copy link
Contributor Author

EdwardX29 commented Apr 14, 2025

Adaptive With 50% heuristic
image

Adaptive with 80% heuristic
image

Add adaptive compression with zstd and snappy. zstd copmression is
probed with exponential backoff when compression ratios are not
better than 50%.
@EdwardX29 EdwardX29 force-pushed the edwardx29/add-adaptive-compressor branch from adcbb03 to ec2ecd9 Compare April 15, 2025 14:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants