ART: Anonymous Region Transformer for
Variable Multi-Layer Transparent Image Generation

This repository supports generating multi-layer transparent images (constructed with multiple RGBA image layers) based on a global text prompt and an anonymous region layout (bounding boxes without layer captions). The anonymous region layout can be either predicted by LLM or manually specified by users.

🌟 Features

Anonymous Layout: Requires only a single global caption to generate multiple layers, eliminating the need for individual captions for each layer.
High Layer Capacity: Supports the generation of 50+ layers, enabling complex multi-layer outputs.
Efficiency: Maintains high efficiency compared to full attention and spatial-temporal attention mechanisms.

🚧 TODO List

Release inference code and pretrained model
Release training code

Multi Layer Generation

Environment Setup

1. Create Conda Environment

conda create -n multilayer python=3.10 -y
conda activate multilayer

2. Install Dependencies

pip3 install torch==2.4.0 torchvision==0.19.0
pip install diffusers==0.31.0 transformers==4.44.0 accelerate==0.34.2 peft==0.12.0 datasets==2.20.0
pip install wandb==0.17.7 einops==0.8.0 sentencepiece==0.2.0 mmengine==0.10.4 prodigyopt==1.0

3. Login to Hugging Face

huggingface-cli login

Quick Start

Use example.py to simply have a try:

python example.py

Testing Multi-Layer-Generation

1. Download Checkpoints

Create a path multi_layer_gen/checkpoints and download the following checkpoints into this path.

Variable	Description	Action Required
`ckpt_dir`	Anonymous region transformer checkpoint	Download from Google Drive
`transp_vae_ckpt`	Multi-layer transparency decoder checkpoint	Download from Google Drive
`pre_fuse_lora_dir`	LoRA weights to be fused initially	Download from Google Drive
`extra_lora_dir`	Optional LoRA weights (for aesthetic improvement)	Download from Google Drive

The downloaded checkpoints should be organized as follows:

checkpoints/
├── anonymous_region_transformer_ckpt/
│   ├── layer_pe.pth
│   └── pytorch_lora_weights.safetensors
├── extra_lora/
|   └── pytorch_lora_weights.safetensors
├── pre_fuse_lora/
|   └── pytorch_lora_weights.safetensors
└── transparent_decoder_ckpt.pt

2. Run the testing Script

python multi_layer_gen/test.py \
--cfg_path=multi_layer_gen/configs/multi_layer_resolution512_test.py \
--save_dir=multi_layer_gen/output/ \
--ckpt_dir=multi_layer_gen/checkpoints/anonymous_region_transformer_ckpt \
--transp_vae_ckpt=multi_layer_gen/checkpoints/transparent_decoder_ckpt.pt \
--pre_fuse_lora_dir=multi_layer_gen/checkpoints/pre_fuse_lora \
--extra_lora_dir=multi_layer_gen/checkpoints/extra_lora

3*. A notebook example

Please see test.ipynb.

LLM For Layout Planning

Environment

1. Create Conda Environment

conda create -n layoutplanner python=3.10 -y
conda activate layoutplanner

2. Install Dependencies

cd layout_planner
pip install -r requirements_part1.txt
pip install -r requirements_part2.txt

if encounter the following issues

if meet 'ImportError: libGL.so.1: cannot open shared object file: No such file or directory'

 apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install ffmpeg libsm6 libxext6  -y

if need to add flash-attn-2

 pip install flash-attn --no-build-isolation

or Get url from https://github.com/Dao-AILab/flash-attention/releases/

 pip install https://github.com/Dao-AILab/flash-attention/releases/download/v2.5.8/flash_attn-2.5.8+cu118torch2.3cxx11abiFALSE-cp310-cp310-linux_x86_64.whl

Inference

1. Configure Inference Script

Edit the following parameters in scripts/inference_template.sh:

Variable	Description	Action Required
`input_model`	Base model checkpoint location	Set to your downloaded model path (Download here)
`resume`	Path to the trained layout planner checkpoint	Set to your checkpoint path (Download here)
`width`	Width of the layout for inference	Set the layout width for inference
`height`	Height of the layout for inference	Set the layout height for inference
`save_path`	Path to save the generated output JSON	Set your desired save path
`do_sample`	Whether to use `do_sample` for generation	Set `True` for `do_sample`, `False` for greedy decoding
`temperature`	Sampling temperature if `do_sample = True`	Adjust as needed
`inference_caption`	User input describing the content of the desired layout	Provide the desired caption for layout generation

2. Run Inference

bash scripts/inference_template.sh

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
assets		assets
layout_planner		layout_planner
multi_layer_gen		multi_layer_gen
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
SUPPORT.md		SUPPORT.md
example.py		example.py
test.ipynb		test.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ART: Anonymous Region Transformer for
Variable Multi-Layer Transparent Image Generation

🌟 Features

🚧 TODO List

Table of Contents

Multi Layer Generation

Environment Setup

1. Create Conda Environment

2. Install Dependencies

3. Login to Hugging Face

Quick Start

Testing Multi-Layer-Generation

1. Download Checkpoints

2. Run the testing Script

3*. A notebook example

LLM For Layout Planning

Environment

1. Create Conda Environment

2. Install Dependencies

Inference

1. Configure Inference Script

2. Run Inference

About

Releases

Packages

Contributors 3

Languages

License

microsoft/art-msra

Folders and files

Latest commit

History

Repository files navigation

ART: Anonymous Region Transformer forVariable Multi-Layer Transparent Image Generation

🌟 Features

🚧 TODO List

Table of Contents

Multi Layer Generation

Environment Setup

1. Create Conda Environment

2. Install Dependencies

3. Login to Hugging Face

Quick Start

Testing Multi-Layer-Generation

1. Download Checkpoints

2. Run the testing Script

3*. A notebook example

LLM For Layout Planning

Environment

1. Create Conda Environment

2. Install Dependencies

Inference

1. Configure Inference Script

2. Run Inference

About

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

ART: Anonymous Region Transformer for
Variable Multi-Layer Transparent Image Generation

Packages