Skip to content

Official repo for ART:Anonymous Region Transformer for Variable Multi-Layer Transparent Image Generation

License

Notifications You must be signed in to change notification settings

microsoft/art-msra

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

15 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

ART: Anonymous Region Transformer for
Variable Multi-Layer Transparent Image Generation

arXiv Project Page Model

This repository supports generating multi-layer transparent images (constructed with multiple RGBA image layers) based on a global text prompt and an anonymous region layout (bounding boxes without layer captions). The anonymous region layout can be either predicted by LLM or manually specified by users.

🌟 Features

  • Anonymous Layout: Requires only a single global caption to generate multiple layers, eliminating the need for individual captions for each layer.
  • High Layer Capacity: Supports the generation of 50+ layers, enabling complex multi-layer outputs.
  • Efficiency: Maintains high efficiency compared to full attention and spatial-temporal attention mechanisms.

🚧 TODO List

  • Release inference code and pretrained model
  • Release training code

Table of Contents

Multi Layer Generation

Environment Setup

1. Create Conda Environment

conda create -n multilayer python=3.10 -y
conda activate multilayer

2. Install Dependencies

pip3 install torch==2.4.0 torchvision==0.19.0
pip install diffusers==0.31.0 transformers==4.44.0 accelerate==0.34.2 peft==0.12.0 datasets==2.20.0
pip install wandb==0.17.7 einops==0.8.0 sentencepiece==0.2.0 mmengine==0.10.4 prodigyopt==1.0

3. Login to Hugging Face

huggingface-cli login

Quick Start

Use example.py to simply have a try:

python example.py

Testing Multi-Layer-Generation

1. Download Checkpoints

Create a path multi_layer_gen/checkpoints and download the following checkpoints into this path.

Variable Description Action Required
ckpt_dir Anonymous region transformer checkpoint Download from Google Drive
transp_vae_ckpt Multi-layer transparency decoder checkpoint Download from Google Drive
pre_fuse_lora_dir LoRA weights to be fused initially Download from Google Drive
extra_lora_dir Optional LoRA weights (for aesthetic improvement) Download from Google Drive

The downloaded checkpoints should be organized as follows:

checkpoints/
β”œβ”€β”€ anonymous_region_transformer_ckpt/
β”‚   β”œβ”€β”€ layer_pe.pth
β”‚   └── pytorch_lora_weights.safetensors
β”œβ”€β”€ extra_lora/
|   └── pytorch_lora_weights.safetensors
β”œβ”€β”€ pre_fuse_lora/
|   └── pytorch_lora_weights.safetensors
└── transparent_decoder_ckpt.pt

2. Run the testing Script

python multi_layer_gen/test.py \
--cfg_path=multi_layer_gen/configs/multi_layer_resolution512_test.py \
--save_dir=multi_layer_gen/output/ \
--ckpt_dir=multi_layer_gen/checkpoints/anonymous_region_transformer_ckpt \
--transp_vae_ckpt=multi_layer_gen/checkpoints/transparent_decoder_ckpt.pt \
--pre_fuse_lora_dir=multi_layer_gen/checkpoints/pre_fuse_lora \
--extra_lora_dir=multi_layer_gen/checkpoints/extra_lora

3*. A notebook example

Please see test.ipynb.

LLM For Layout Planning

Environment

1. Create Conda Environment

conda create -n layoutplanner python=3.10 -y
conda activate layoutplanner

2. Install Dependencies

cd layout_planner
pip install -r requirements_part1.txt
pip install -r requirements_part2.txt

if encounter the following issues

  • if meet 'ImportError: libGL.so.1: cannot open shared object file: No such file or directory'

     apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install ffmpeg libsm6 libxext6  -y
  • if need to add flash-attn-2

     pip install flash-attn --no-build-isolation

    or Get url from https://github.com/Dao-AILab/flash-attention/releases/

     pip install https://github.com/Dao-AILab/flash-attention/releases/download/v2.5.8/flash_attn-2.5.8+cu118torch2.3cxx11abiFALSE-cp310-cp310-linux_x86_64.whl

Inference

1. Configure Inference Script

Edit the following parameters in scripts/inference_template.sh:

Variable Description Action Required
input_model Base model checkpoint location Set to your downloaded model path (Download here)
resume Path to the trained layout planner checkpoint Set to your checkpoint path (Download here)
width Width of the layout for inference Set the layout width for inference
height Height of the layout for inference Set the layout height for inference
save_path Path to save the generated output JSON Set your desired save path
do_sample Whether to use do_sample for generation Set True for do_sample, False for greedy decoding
temperature Sampling temperature if do_sample = True Adjust as needed
inference_caption User input describing the content of the desired layout Provide the desired caption for layout generation

2. Run Inference

bash scripts/inference_template.sh

About

Official repo for ART:Anonymous Region Transformer for Variable Multi-Layer Transparent Image Generation

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •