-
Notifications
You must be signed in to change notification settings - Fork 188
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
multi-threading option #185
Comments
We can do it
Normally our results are CPU-bound. With multi-threading the results may be I/O-bound so on the same machine you will get different results depending on storage you are using.
Some MT implementation are done outside the compression library and it will be hard to use them. Moreover we will come back to situation of |
I thought the whole corpus is read into memory. There is no 'I/O' in memory, is it?
Only pbzip2, lbzip2, pigz, plzip, bsc(?), bzip3, zstd-mt, rar (I guess) that's all I can think of, but for sure, I miss some more. Those 'internal' I know of libbsc(?), libsais, fast-lzma2, and now kanzi. I can see a pattern here. If you wold implement MT then it would be 'external', 'client-thing', and the bench would test implementation, rather than codec. I wouldn't go this way.
If it was clearly stated that it's multi-threaded, and how multi-threaded, then it's not a problem. Could look like this:
Then user specifies threaded option:
2 threads:
'block size' is optional parameter to threads. Obviously it's not provided in the example, nor used by kanzi. Just thought it could be one of parameters to threads. Doesn't have to be. Could also be limited to one codec only, when using threads. That would solve the problem of 'fairness'. |
Agree. I got confused with database benchmarks :)
Also zstd but not so many of them. |
Zstd library uses threads internally? I thought it was implemented in client. FYI, there is another one that uses threads. |
It's in the lib: |
Had a look how is it organised and I thought - Is there a way to simply pass an argument to the de/compressor?
Maybe, instead of multiplying arguments of function, like param1, param2... paramN
inline int64_t lzbench_compress(lzbench_params_t *params, std::vector<size_t>& chunk_sizes, compress_func compress, std::vector<size_t> &compr_sizes, uint8_t *inbuf, uint8_t *outbuf, size_t outsize, size_t param1, size_t param2, char* workmem);
aggregate all those additional, optional parameters in a struct and pass a struct to the function.
inline int64_t lzbench_compress(lzbench_params_t *params, std::vector<size_t>& chunk_sizes, compress_func compress, std::vector<size_t> &compr_sizes, uint8_t *inbuf, uint8_t *outbuf, size_t outsize, struct options *codec_params);
struct codec_options { param1; param2; (...) paramN; };
It would allow to seamless extension in the future as only thing passed would be a pointer (to struct).
Also that could require to make new column called 'threads' that would be turn on in case of threads option to the program. Just to give justice to that codec and the others that don't necessarily use threads.
Well, at least to show it after header.
It the user's choice.
Like how?
And modern hardware, which is likely to ba able to handle load and requirements of the task. Again - user's choice.
If it's clearly stated it multi-threaded result it's user's choice.
It would only use 'internal', 'codec's implementation'. Again - user's choice.
Most, if not all, multi-threaded compressors I can think of operate on client not codec level. And I think it's a good thing.
I wouldn't worry about that here. If codec offers multi-threading, it's their job to do it efficiently.
Again, if it was clearly marked as X-threaded, then it's user's decision to make conclusions.
All in all, allowing 'codec's internal' threading is ok as long as it's clearly stated it's multi-threaded measure.
The text was updated successfully, but these errors were encountered: