multi-threading option #185

tansy · 2025-02-12T14:52:28Z

We may add a support for multi-threading to lzbench in the future but it will be a new feature. Looking for volunteers :)

Had a look how is it organised and I thought - Is there a way to simply pass an argument to the de/compressor?
Maybe, instead of multiplying arguments of function, like param1, param2... paramN

inline int64_t lzbench_compress(lzbench_params_t *params, std::vector<size_t>& chunk_sizes, compress_func compress, std::vector<size_t> &compr_sizes, uint8_t *inbuf, uint8_t *outbuf, size_t outsize, size_t param1, size_t param2, char* workmem);

aggregate all those additional, optional parameters in a struct and pass a struct to the function.

inline int64_t lzbench_compress(lzbench_params_t *params, std::vector<size_t>& chunk_sizes, compress_func compress, std::vector<size_t> &compr_sizes, uint8_t *inbuf, uint8_t *outbuf, size_t outsize, struct options *codec_params);

struct codec_options { param1; param2; (...) paramN; };

It would allow to seamless extension in the future as only thing passed would be a pointer (to struct).

Also that could require to make new column called 'threads' that would be turn on in case of threads option to the program. Just to give justice to that codec and the others that don't necessarily use threads.
Well, at least to show it after header.

it's much easier to reach memory or storage limits

It the user's choice.

what skews results

Like how?

only modern compressors support multithreading

And modern hardware, which is likely to ba able to handle load and requirements of the task. Again - user's choice.

should we compare e.g. a single threaded lzf with multithreaded lz4?

If it's clearly stated it multi-threaded result it's user's choice.

even if a compressor support multithreading we will compare not just compressors but also how effective is its multi-threading implementation vs other implementations

It would only use 'internal', 'codec's implementation'. Again - user's choice.

Most, if not all, multi-threaded compressors I can think of operate on client not codec level. And I think it's a good thing.
I wouldn't worry about that here. If codec offers multi-threading, it's their job to do it efficiently.

to make testing fair we can implement multi-threading in lzbench for all compressors and disable internal multi-threading implementation (but this is a new feature)

Again, if it was clearly marked as X-threaded, then it's user's decision to make conclusions.

All in all, allowing 'codec's internal' threading is ok as long as it's clearly stated it's multi-threaded measure.

The text was updated successfully, but these errors were encountered:

inikep · 2025-02-13T16:35:28Z

aggregate all those additional, optional parameters in a struct and pass a struct to the function

We can do it

Like how?

Normally our results are CPU-bound. With multi-threading the results may be I/O-bound so on the same machine you will get different results depending on storage you are using.

It would only use 'internal', 'codec's implementation'.

Some MT implementation are done outside the compression library and it will be hard to use them.
For example it seems that MT for LZ4 is implemented in programs:
https://github.com/inikep/lzbench/blob/master/lz4/programs/lz4io.c
https://github.com/inikep/lzbench/blob/master/lz4/programs/threadpool.c

Moreover we will come back to situation of kanzi MT vs lzf single-threaded. I know, it will be described, but still not fair IMHO :)

tansy · 2025-02-13T19:21:37Z

With multi-threading the results may be I/O-bound

I thought the whole corpus is read into memory. There is no 'I/O' in memory, is it?

Some MT implementation are done outside the compression library

Only pbzip2, lbzip2, pigz, plzip, bsc(?), bzip3, zstd-mt, rar (I guess) that's all I can think of, but for sure, I miss some more.

Those 'internal' I know of libbsc(?), libsais, fast-lzma2, and now kanzi.

I can see a pattern here. If you wold implement MT then it would be 'external', 'client-thing', and the bench would test implementation, rather than codec. I wouldn't go this way.

Moreover we will come back to situation of kanzi MT vs lzf single-threaded. I know, it will be described, but still not fair IMHO

If it was clearly stated that it's multi-threaded, and how multi-threaded, then it's not a problem.

Could look like this:
regular result:

$ lzbench -ekazni ./lzbench
lzbench 2.0.1 (64-bit Linux)

Compressor name         Compress. Decompress. Compr. size  Ratio Filename
memcpy                  11612 MB/s 11732 MB/s     7615680 100.00 ./lzbench
kanzi 2.3 -2             19.7 MB/s  80.6 MB/s     3272575  42.97 ./lzbench
kanzi 2.3 -3             12.9 MB/s  83.1 MB/s     3262039  42.83 ./lzbench
kanzi 2.3 -4             5.28 MB/s  20.3 MB/s     3104334  40.76 ./lzbench

Then user specifies threaded option:
1 thread:

$ lzbench -T1 -ekazni ./lzbench
lzbench 2.0.1 (64-bit Linux)
Using threads: 1, block size: x

Compressor name    Th     Compress. Decompress. Compr. size  Ratio Filename
memcpy              1     11612 MB/s 11732 MB/s     7615680 100.00 ./lzbench
kanzi 2.3 -2        1      19.7 MB/s  80.6 MB/s     3272575  42.97 ./lzbench
kanzi 2.3 -3        1      12.9 MB/s  83.1 MB/s     3262039  42.83 ./lzbench
kanzi 2.3 -4        1      5.28 MB/s  20.3 MB/s     3104334  40.76 ./lzbench

2 threads:

$ lzbench -T2 -ekazni ./lzbench
lzbench 2.0.1 (64-bit Linux)
Using threads: 2, block size: x

Compressor name    Th     Compress. Decompress. Compr. size  Ratio Filename
memcpy              1     11612 MB/s 11732 MB/s     7615680 100.00 ./lzbench
kanzi 2.3 -2        2      30.6 MB/s   108 MB/s     3264768  42.87 ./lzbench
kanzi 2.3 -3        2      16.6 MB/s  84.1 MB/s     3256681  42.77 ./lzbench
kanzi 2.3 -4        2      6.19 MB/s  25.6 MB/s     3099558  40.70 ./lzbench

'block size' is optional parameter to threads. Obviously it's not provided in the example, nor used by kanzi. Just thought it could be one of parameters to threads. Doesn't have to be.

Could also be limited to one codec only, when using threads. That would solve the problem of 'fairness'.

inikep · 2025-02-15T16:08:54Z

I thought the whole corpus is read into memory. There is no 'I/O' in memory, is it?

Agree. I got confused with database benchmarks :)

Those 'internal' I know of libbsc(?), libsais, fast-lzma2, and now kanzi.

Also zstd but not so many of them.

tansy · 2025-02-16T19:00:43Z

Those 'internal' I know of libbsc(?), libsais, fast-lzma2, and now kanzi.

Also zstd but not so many of them.

Zstd library uses threads internally? I thought it was implemented in client.

FYI, there is another one that uses threads.

inikep · 2025-02-17T13:20:43Z

It's in the lib:
https://github.com/inikep/lzbench/blob/master/zstd/lib/compress/zstd_compress.c#L6354-L6406
but you have to turn it on with -DZSTD_MULTITHREAD.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

multi-threading option #185

multi-threading option #185

tansy commented Feb 12, 2025

inikep commented Feb 13, 2025

tansy commented Feb 13, 2025

inikep commented Feb 15, 2025

tansy commented Feb 16, 2025

inikep commented Feb 17, 2025

multi-threading option #185

multi-threading option #185

Comments

tansy commented Feb 12, 2025

inikep commented Feb 13, 2025

tansy commented Feb 13, 2025

inikep commented Feb 15, 2025

tansy commented Feb 16, 2025

inikep commented Feb 17, 2025