Skip to content

Image compression with pretrained latent diffusion autoencoding models.

License

Notifications You must be signed in to change notification settings

quickgrid/vq-compress

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

59 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

vq-compress

Image compression and reconstruction using pretrained autoencoder, vqgan first stage models from latent diffusion and taming transformer repo. Model codes, configs are copied from these repos with unnecessary parts removed.

Saves autoencoding model encoded output as compressed format. This output is passed to decoder on receiver side to reconstruct a lossy compressed version original image. Based on chosen seettings of autoencoding models the encoded output or its indices in case vqgan can be saved which can be used for reconstuction.

To save vram and prevent extra processing only encoder or decoder weights based on compression or decompression task is loaded. Training code is removed but should be able to load models trained on original repo.

Compressed data is saved in safetensors format. For compression if batch size larger than 1 is used then each output contains encode output tensor for the whole batch.

vq-f4, vq-f8, kl-f4, kl-f8 configs provide the best reconstruction results.

Compressing with vqgan model by removing --kl and adding --vq_ind for vq-f4, vq-f8 should provide best compression ratio. Further using a zip program to compress the saved output may provide better quality and file size reduction than using jpeg with quality reduced to around 60 percent. A good quality pretrained reconstruction model for vq-f8 followed by zip compression may provide best results in terms of file size.

When using --vq_ind also adding ind_bit to 8 should give the most compressed output though not best quality. It will not work with most configs as output value ranges need to be from 0-255. Only this config and its associated model will work with int_bit set to 8.

Install

Run following command on setup.py folder before running library.

pip install -e .

Commands

Compress

kl compress,

python compression.py -s "SRC_PATH" -d "DEST_PATH" --cfg "CONFIG_YAML_PATH" --ckpt "VAE_CKPT_PATH" --kl --batch 2 --img_size 384

vq compress with indices,

python compression.py -s "SRC_PATH" -d "DEST_PATH" --cfg "CONFIG_YAML_PATH" --ckpt "VAE_CKPT_PATH" --batch 1 --img_size 512 --vq_ind --ind_bit 16

Decompress

kl decompress,

python compression.py -s "SRC_PATH" -d "DEST_PATH" --cfg "CONFIG_YAML_PATH" --ckpt "VAE_CKPT_PATH" --kl --dc

vq decompress with indices,

python compression.py -s "SRC_PATH" -d "DEST_PATH" --cfg "CONFIG_YAML_PATH" --ckpt "VAE_CKPT_PATH" --dc --vq_ind

Flags

If --dc flag is provided it runs decompression otherwise compresses input.

--aspect resize image keeping aspect ratio with smaller dimension size set to --img_size. May fail for large images not fitting in gpu memory.

For --ind_bit with possible values 8 or 16 vqgan indices are saved as uint8 or int16 reducing compressed output file size. Only needed for compression.

--xformers uses xformers if available to reduce memory consumption and may also increse speed.

--float16 process in float16 precision to reduce memory consumption.

Currently 3 types of data compression is available.

  • For --kl autoencoder kl pretrained model encode output is saved.
  • If --kl not specified then vqgan encode output is saved.
  • If --vq_ind specified then indices are saved. These are used to reconstruct image.

Pretrained Model and Configs

Original configs can be found here. More weights can be found on latent diffusion repo. ru-dalle vq-f8-gumbel model trained on taming transformers repo can also be used.

For kl-f8 stable diffusion vae ckpt can be used. Gives 8x downsampling.

For kl-f4 config,

For vq-f4 config,

Following may provide better compression rates but there maybe noticable degradation in reconstructed images.

For vq-f8 config,

For vq-f8-n256 config,

For kl-f16 config,

For kl-f32 config,

For vq-f8-gumbel config,

For vq-f8-rudalle config,

References