Skip to content

Commit 28796e7

Browse files
authored
Initial tests with 2.3.0
1 parent 32f9b75 commit 28796e7

File tree

1 file changed

+60
-43
lines changed

1 file changed

+60
-43
lines changed

README.md

Lines changed: 60 additions & 43 deletions
Original file line numberDiff line numberDiff line change
@@ -8,8 +8,11 @@ Kanzi is a modern, modular, expandable and efficient lossless data compressor im
88
* expandable: clean design with heavy use of interfaces as contracts makes integrating and expanding the code easy. No dependencies.
99
* efficient: the code is optimized for efficiency (trade-off between compression ratio and speed).
1010

11-
Unlike the most common lossless data compressors, Kanzi uses a variety of different compression algorithms and supports a wider range of compression ratios as a result. Most usual compressors do not take advantage of the many cores and threads available on modern CPUs (what a waste!). Kanzi is multithreaded by design and uses several threads by default to compress blocks concurrently. It is not compatible with standard compression formats.
12-
Kanzi is a lossless data compressor, not an archiver. It uses checksums (optional but recommended) to validate data integrity but does not have a mechanism for data recovery. It also lacks data deduplication across files.
11+
Unlike the most common lossless data compressors, Kanzi uses a variety of different compression algorithms and supports a wider range of compression ratios as a result.
12+
Most usual compressors do not take advantage of the many cores and threads available on modern CPUs (what a waste!). Kanzi is concurrent by design and uses threads to compress several blocks in parallel. It is not compatible with standard compression formats.
13+
14+
Kanzi is a lossless data compressor, not an archiver. It uses checksums (optional but recommended) to validate data integrity but does not have a mechanism for data recovery.
15+
It also lacks data deduplication across files. However, Kanzi generates a bitstream that is seekable (one or several consecutive blocks can be decompressed without the need for the whole bitstream to be decompressed).
1316

1417

1518
For more details, check https://github.com/flanglet/kanzi/wiki.
@@ -22,7 +25,8 @@ There is Go implementation available here: https://github.com/flanglet/kanzi-go
2225

2326

2427
![Build Status](https://github.com/flanglet/kanzi/actions/workflows/ant.yml/badge.svg)
25-
28+
[![Quality Gate Status](https://sonarcloud.io/api/project_badges/measure?project=flanglet_kanzi&metric=alert_status)](https://sonarcloud.io/summary/new_code?id=flanglet_kanzi)
29+
[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](LICENSE)
2630

2731

2832
## Why Kanzi
@@ -31,7 +35,7 @@ There are many excellent, open-source lossless data compressors available alread
3135

3236
If gzip is starting to show its age, zstd and brotli are open-source, standardized and used
3337
daily by millions of people. Zstd is incredibly fast and probably the best choice in many cases.
34-
There are a few scenarios where Kanzi could be a better choice:
38+
There are a few scenarios where Kanzi can be a better choice:
3539

3640
- gzip, lzma, brotli, zstd are all LZ based. It means that they can reach certain compression
3741
ratios only. Kanzi also makes use of BWT and CM which can compress beyond what LZ can do.
@@ -46,26 +50,28 @@ at compression time to better compress specific kinds of data.
4650

4751
- Kanzi can take advantage of the multiple cores of a modern CPU to improve performance
4852

49-
- It is easy to implement a new transform or entropy codec to either test an idea or improve
50-
compression ratio on specific kinds of data.
51-
52-
53+
- Implementing a new transform or entropy codec (to either test an idea or improve compression ratio on specific kinds of data) is simple.
54+
5355

5456
## Benchmarks
5557

5658
Test machine:
5759

5860
AWS c5a8xlarge: AMD EPYC 7R32 (32 vCPUs), 64 GB RAM
5961

60-
openjdk 21.0.1+12-29
62+
openjdk 21.0.3 2024-04-16
63+
64+
Ubuntu 24.04 LTS
65+
66+
Kanzi version 2.3.0 Java
6167

62-
Ubuntu 22.04.3 LTS
68+
On this machine, Kanzi uses up to 16 threads (half of CPUs by default).
6369

64-
Kanzi version 2.2 Java implementation.
70+
bzip3 uses 16 threads. zstd uses 16 threads for compression and 1 for decompression,
71+
other compressors are single threaded.
6572

66-
On this machine kanzi can use up to 16 threads (depending on compression level).
67-
bzip3 uses 16 threads. zstd can use 2 for compression, other compressors
68-
are single threaded.
73+
The default block size at level 9 is 32MB, severely limiting the number of threads
74+
in use, especially with enwik8, but all tests are performed with default values.
6975

7076

7177
### silesia.tar
@@ -75,28 +81,28 @@ Download at http://sun.aei.polsl.pl/~sdeor/corpus/silesia.zip
7581
| Compressor | Encoding (sec) | Decoding (sec) | Size |
7682
|---------------------------------|-----------------|-----------------|------------------|
7783
|Original | | | 211,957,760 |
78-
|**Kanzi -l 1** | **1.337** | **1.186** | **80,284,705** |
79-
|lz4 1.9.5 -4 | 3.397 | 0.987 | 79,914,864 |
80-
|Zstd 1.5.5 -2 | 0.761 | 0.286 | 69,590,245 |
81-
|**Kanzi -l 2** | **1.343** | **1.343** | **68,231,498** |
82-
|Brotli 1.1.0 -2 | 1.749 | 2.459 | 68,044,145 |
83-
|Gzip 1.10 -9 | 20.15 | 1.316 | 67,652,229 |
84-
|**Kanzi -l 3** | **1.906** | **1.692** | **64,916,444** |
85-
|Zstd 1.5.5 -5 | 2.003 | 0.324 | 63,103,408 |
86-
|**Kanzi -l 4** | **2.458** | **2.521** | **60,770,201** |
87-
|Zstd 1.5.5 -9 | 4.166 | 0.282 | 59,444,065 |
88-
|Brotli 1.1.0 -6 | 14.53 | 4.263 | 58,552,177 |
89-
|Zstd 1.5.5 -13 | 19.15 | 0.276 | 58,061,115 |
90-
|Brotli 1.1.0 -9 | 70.07 | 7.149 | 56,408,353 |
84+
|**Kanzi -l 1** | **1.137** | **1.153** | **80,277,212** |
85+
|Lz4 1.9.5 -4 | 0.321 | 0.330 | 79,912,419 |
86+
|Zstd 1.5.6 -2 -T16 | 0.151 | 0.271 | 69,556,157 |
87+
|**Kanzi -l 2** | **1.082** | **1.313** | **68,195,845** |
88+
|Brotli 1.1.0 -2 | 1.749 | 0.761 | 68,041,629 |
89+
|Gzip 1.12 -9 | 20.09 | 1.403 | 67,652,449 |
90+
|**Kanzi -l 3** | **1.884** | **1.624** | **65,613,695** |
91+
|Zstd 1.5.6 -5 -T16 | 0.356 | 0.289 | 63,131,656 |
92+
|**Kanzi -l 4** | **2.548** | **2.420** | **61,249,959** |
93+
|Zstd 1.5.5 -9 -T16 | 0.690 | 0.278 | 59,429,335 |
94+
|Brotli 1.1.0 -6 | 8.388 | 0.677 | 58,571,909 |
95+
|Zstd 1.5.6 -13 -T16 | 3.244 | 0.272 | 58,041,112 |
96+
|Brotli 1.1.0 -9 | 70.07 | 0.677 | 56,376,419 |
9197
|Bzip2 1.0.8 -9 | 16.94 | 6.734 | 54,572,500 |
92-
|**Kanzi -l 5** | **3.228** | **2.268** | **54,051,139** |
93-
|Zstd 1.5.5 -19 | 92.82 | 0.302 | 52,989,654 |
94-
|**Kanzi -l 6** | **4.950** | **2.522** | **49,517,823** |
95-
|Lzma 5.2.5 -9 | 92.6 | 3.075 | 48,744,632 |
96-
|**Kanzi -l 7** | **4.478** | **3.181** | **47,308,484** |
98+
|**Kanzi -l 5** | **3.270** | **2.143** | **54,039,773** |
99+
|Zstd 1.5.6 -19 -T16 | 20.87 | 0.303 | 52,889,925 |
100+
|**Kanzi -l 6** | **4.506** | **2.256** | **49,567,817** |
101+
|Lzma 5.4.5 -9 | 95.97 | 3.172 | 48,745,354 |
102+
|**Kanzi -l 7** | **4.246** | **3.251** | **47,520,629** |
97103
|bzip3 1.3.2.r4-gb2d61e8 -j 16 | 2.682 | 3.221 | 47,237,088 |
98-
|**Kanzi -l 8** | **10.67** | **11.13** | **43,247,248** |
99-
|**Kanzi -l 9** | **24.78** | **26.73** | **41,807,179** |
104+
|**Kanzi -l 8** | **9.549** | **9.983** | **43,167,429** |
105+
|**Kanzi -l 9** | **26.95** | **28.31** | **41,497,835** |
100106
|zpaq 7.15 -m5 -t16 | 213.8 | 213.8 | 40,050,429 |
101107

102108

@@ -108,15 +114,26 @@ Download at https://mattmahoney.net/dc/enwik8.zip
108114
| Compressor | Encoding (sec) | Decoding (sec) | Size |
109115
|------------------------|------------------|------------------|------------------|
110116
|Original | | | 100,000,000 |
111-
|**Kanzi -l 1** | **1.221** | **0.684** | **43,747,730** |
112-
|**Kanzi -l 2** | **1.254** | **0.907** | **37,745,093** |
113-
|**Kanzi -l 3** | **1.093** | **0.989** | **33,839,184** |
114-
|**Kanzi -l 4** | **1.800** | **1.648** | **29,598,635** |
115-
|**Kanzi -l 5** | **2.066** | **1.740** | **26,527,955** |
116-
|**Kanzi -l 6** | **2.648** | **1.743** | **24,076,669** |
117-
|**Kanzi -l 7** | **3.742** | **1.741** | **22,817,376** |
118-
|**Kanzi -l 8** | **6.619** | **6.633** | **21,181,978** |
119-
|**Kanzi -l 9** | **17.81** | **18.23** | **20,035,133** |
117+
|**Kanzi -l 1** | **1.140** | **0.596** | **43,746,017** |
118+
|**Kanzi -l 2** | **1.040** | **0.720** | **37,816,913** |
119+
|**Kanzi -l 3** | **1.148** | **0.892** | **33,865,383** |
120+
|**Kanzi -l 4** | **1.321** | **1.566** | **29,597,577** |
121+
|**Kanzi -l 5** | **1.751** | **1.649** | **26,528,023** |
122+
|**Kanzi -l 6** | **2.954** | **1.319** | **24,076,674** |
123+
|**Kanzi -l 7** | **3.234** | **2.322** | **22,817,373** |
124+
|**Kanzi -l 8** | **6.836** | **6.741** | **21,181,983** |
125+
|**Kanzi -l 9** | **17.99** | **18.41** | **20,035,138** |
126+
127+
128+
## Build
129+
130+
First option (ant):
131+
132+
```ant```
133+
134+
Second option (maven):
135+
136+
```mvn -Dmaven.test.skip=true```
120137

121138

122139
Credits

0 commit comments

Comments
 (0)