Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decoding multiple frames without decoder create/destroy #306

Open
slugwarz05 opened this issue Mar 4, 2024 · 6 comments
Open

Decoding multiple frames without decoder create/destroy #306

slugwarz05 opened this issue Mar 4, 2024 · 6 comments

Comments

@slugwarz05
Copy link

Similar to #86 for encoding, it would be great if an enhancement was made to implement a function similar to charls_jpegls_encoder_rewind, except on the decoding side of things.

The ideal state would be if, when decoding multiple same-sized images, no additional new/delete calls are made. This would be achieved in part by not needing to create and destroy a charls_jpegls_decoder object for each image that needs decoding.

Or, if it is possible to calculate memory needs for the encode/decode operation, enabling custom allocators would solve this as well. But that seems like a longer-term enhancement.

@vbaderks
Copy link
Contributor

vbaderks commented Mar 5, 2024

Hi,

The included test application CharLSTest can be used to run performance tests. With -decodeperformance:10 it is for example possible to decode the same image 10 times.
If I execute this perf test in a profiler, 99% of the CPU is spend in the method decode_lines() or one of its children.

Do you have a specific scenario for which you see a significant performance improvement by re-using the same decoder instance?

Note: internal CharLS will also perform 2 new calls:

  • 1 call to allocate a buffer to store 2 scan lines (size depends on the size of the image that needs to be decoded)
  • 1 call to construct the scan_decoder

@slugwarz05
Copy link
Author

Hello,

The specific scenario I have is one where thousands of monochrome, 16 bit-depth, and same-sized images are compressed with JPEG-LS and placed into a binary file. Very similar to the basic operating principle of MJPEG, except other structures besides imagery are also embedded in the file. Due to the high quantity of imagery, it is prudent to minimize superfluous calls to new/delete.

I believe I was able to accomplish this on the encoding side through use of charls_jpegls_encoder_rewind, but would like a way to do this for decoding as well.

@vbaderks
Copy link
Contributor

I will see if it is possible to create a proof of concept for charls_jpegls_decoder_rewind.

@vbaderks
Copy link
Contributor

I have created in the branch rewind-poc a version that has a method charls_jpegls_decoder_rewind.
If you see see a significant performance improvement, let me know.

Note: deep inside the decoding there are still 2 allocations done. It might be possible to reuse these also.

@slugwarz05
Copy link
Author

Much appreciated! I pulled the branch down and found that my rather old C++ compiler (VS2015) does not support CharLS 3.x, with its migration to C++17. I was originally working from the 2.4.2 tag.

I will attempt a manual merge/transcription the commit's substantive changes for throughput testing.

@slugwarz05
Copy link
Author

I completed a manual merge/transcription into the 2.4.2 baseline (the primary inconsistency is omission of thresholds/reset_value checks in the new function is_compatible due to where the function lives in the decoder of 2.4.2), and ran a test case where 36,338 frames of 640x512 (16 bit monochrome) resolution were, in sequence:

  1. Encoded, and the same encoded buffer was used to...
  2. Decode by creating/destroying a charls_jpegls_decoder object
  3. Decode by maintaining a charls_jpegls_decoder object and calling charls_jpegls_decoder_rewind after each decode

Memory usage went up by 100kB (but that's the only resolution I had on task manager) for the duration of the test, and the average decode time difference was negligible for my case (~0.4% slower, and interestingly decoding with rewind was slower).

I'll defer to you for the determination on whether these results warrant being part of the next tag.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants