The official repo for MISC: Ultra-low Bitrate Image Semantic Compression Driven by Large Multimodal Model
Download weights and put them into the weight folder:
DiffBIR (general_full_v1.ckpt): link Cheng2020-Tuned (cheng_small.pth.tar): link
If you want to use 'mask', download the CLIP_Surgery model. Put the `clip' folder in the same directory as this project.
Run the ipynb code in different modes to decompress the image!
-
If you want pixel-instructed decoding, set the mode as 'pixel', a larger `block_num_min' means more pixels, with a larger bpps cost.
-
If you want net-instructed decoding, set the mode as 'net' to use our fine-tuned Cheng-2020 net. You can also use your own net weight trained by CompressAI.
-
If you want to use other models (like VVC, HiFiC, ...) as the starting point of diffusion, set the mode as 'ref', run your own model, and give the decompressed image and the bpps of your model.
[Feb 29, 2024] A simple Jupyter demo is uploaded. The encoder and decoder model weights will be uploaded soon.
[Apr 24, 2024] The model weights are uploaded. Please follow the instruction when using the ipynb file. We are working on a pipeline for en/decoding a group of image.
If you find our work useful, please cite our paper as:
@misc{li2024misc,
title={MISC: Ultra-low Bitrate Image Semantic Compression Driven by Large Multimodal Model},
author={Chunyi Li and Guo Lu and Donghui Feng and Haoning Wu and Zicheng Zhang and Xiaohong Liu and Guangtao Zhai and Weisi Lin and Wenjun Zhang},
year={2024},
eprint={2402.16749},
archivePrefix={arXiv},
primaryClass={cs.CV}
}