Controllable Text-to-Image Generation with Customized Guidance on Appearance and Position (Stable diffusion)

This implementation is designed to control a specific target that appears in the prompt, focusing on its appearance and position. It allows you to transfer the appearance from one generated image to a new generated image and specify the target position using box coordinates. By extracting features from Cross Attention layers as guidance, this method operates effectively without the need for any model training or fine-tuning, which is different from LoRA and ControlNet. This approach leverages the inherent capabilities of the Cross Attention mechanism to ensure accurate and efficient feature transfer and positioning.

Experiments & Results

Given one ONE reference image, the method can almost retain the details of the target concept.

The method can also be applied to features other than the concept appearance, such as position, size, and so on. The only difference is the design of loss function (energy function).

It is also easy to learn multiple reference image on seperate targets, and combine them into one generated image.

How to use

The .ipynb contains the pipeline and the edited unet is in my_model. Please load the Unet from my_model directory, rather than the diffuser package. Please see the details in Poster.pdf

Related paper

The code is based on the implementation of the following papers:

Training-free layout control with cross-attention guidance.

Diffusion self-guidance for controllable image generation.

Prompt-to-prompt image editing with cross-attention control.

Name		Name	Last commit message	Last commit date
Latest commit lindapu-1 Update README.md Jul 3, 2024 a29b11c · Jul 3, 2024 History 24 Commits
my_model		my_model
Poster Revised.pdf		Poster Revised.pdf
README.md		README.md
TargetControl.ipynb		TargetControl.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Controllable Text-to-Image Generation with Customized Guidance on Appearance and Position (Stable diffusion)

Experiments & Results

How to use

Related paper

About

Releases

Packages

Languages

lindapu-1/TargetControl

Folders and files

Latest commit

History

Repository files navigation

Controllable Text-to-Image Generation with Customized Guidance on Appearance and Position (Stable diffusion)

Experiments & Results

How to use

Related paper

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages