This repository supports generating multi-layer transparent images (constructed with multiple RGBA image layers) based on a global text prompt and an anonymous region layout (bounding boxes without layer captions). The anonymous region layout can be either predicted by LLM or manually specified by users.
- Anonymous Layout: Requires only a single global caption to generate multiple layers, eliminating the need for individual captions for each layer.
- High Layer Capacity: Supports the generation of 50+ layers, enabling complex multi-layer outputs.
- Efficiency: Maintains high efficiency compared to full attention and spatial-temporal attention mechanisms.
- Release inference code and pretrained model
- Release training code
conda create -n multilayer python=3.10 -y
conda activate multilayer
pip3 install torch==2.4.0 torchvision==0.19.0
pip install diffusers==0.31.0 transformers==4.44.0 accelerate==0.34.2 peft==0.12.0 datasets==2.20.0
pip install wandb==0.17.7 einops==0.8.0 sentencepiece==0.2.0 mmengine==0.10.4 prodigyopt==1.0
huggingface-cli login
Use example.py to simply have a try:
python example.py
Create a path multi_layer_gen/checkpoints
and download the following checkpoints into this path.
Variable | Description | Action Required |
---|---|---|
ckpt_dir |
Anonymous region transformer checkpoint | Download from Google Drive |
transp_vae_ckpt |
Multi-layer transparency decoder checkpoint | Download from Google Drive |
pre_fuse_lora_dir |
LoRA weights to be fused initially | Download from Google Drive |
extra_lora_dir |
Optional LoRA weights (for aesthetic improvement) | Download from Google Drive |
The downloaded checkpoints should be organized as follows:
checkpoints/
βββ anonymous_region_transformer_ckpt/
β βββ layer_pe.pth
β βββ pytorch_lora_weights.safetensors
βββ extra_lora/
| βββ pytorch_lora_weights.safetensors
βββ pre_fuse_lora/
| βββ pytorch_lora_weights.safetensors
βββ transparent_decoder_ckpt.pt
python multi_layer_gen/test.py \
--cfg_path=multi_layer_gen/configs/multi_layer_resolution512_test.py \
--save_dir=multi_layer_gen/output/ \
--ckpt_dir=multi_layer_gen/checkpoints/anonymous_region_transformer_ckpt \
--transp_vae_ckpt=multi_layer_gen/checkpoints/transparent_decoder_ckpt.pt \
--pre_fuse_lora_dir=multi_layer_gen/checkpoints/pre_fuse_lora \
--extra_lora_dir=multi_layer_gen/checkpoints/extra_lora
Please see test.ipynb
.
conda create -n layoutplanner python=3.10 -y
conda activate layoutplanner
cd layout_planner
pip install -r requirements_part1.txt
pip install -r requirements_part2.txt
if encounter the following issues
-
if meet
'ImportError: libGL.so.1: cannot open shared object file: No such file or directory'
apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install ffmpeg libsm6 libxext6 -y
-
if need to add
flash-attn-2
pip install flash-attn --no-build-isolation
or Get url from https://github.com/Dao-AILab/flash-attention/releases/
pip install https://github.com/Dao-AILab/flash-attention/releases/download/v2.5.8/flash_attn-2.5.8+cu118torch2.3cxx11abiFALSE-cp310-cp310-linux_x86_64.whl
Edit the following parameters in scripts/inference_template.sh
:
Variable | Description | Action Required |
---|---|---|
input_model |
Base model checkpoint location | Set to your downloaded model path (Download here) |
resume |
Path to the trained layout planner checkpoint | Set to your checkpoint path (Download here) |
width |
Width of the layout for inference | Set the layout width for inference |
height |
Height of the layout for inference | Set the layout height for inference |
save_path |
Path to save the generated output JSON | Set your desired save path |
do_sample |
Whether to use do_sample for generation |
Set True for do_sample , False for greedy decoding |
temperature |
Sampling temperature if do_sample = True |
Adjust as needed |
inference_caption |
User input describing the content of the desired layout | Provide the desired caption for layout generation |
bash scripts/inference_template.sh