Skip to content

VisuNex, a fork of original stable diffusion repository is an attempt to personalize text-to-image which allows users to personalize image creation based on their unique aesthetic preferences.

License

Notifications You must be signed in to change notification settings

Malav5372/VisuNex

Repository files navigation

VisuNex: Personalized Text-to-Image Generation Using Stable Diffusion 🎨

Project Motive:

The core motive behind VisuNex is to empower users to infuse their user-defined aesthetics into the visuals it generates. Traditional text-to-image models excel at transforming textual descriptions into images, but they fall short when users aim to convey intricate aesthetic preferences solely through words. it addresses this challenge head-on, aiming to capture and replicate the subtle nuances of desired aesthetics.

prerequisites

This is a fork of the original stable-diffusion repository, so the prerequisites are the same as the original repository

Personalization Variants:

VisuNex offers personalized variants that harness the potential of SAC (Simulacra Aesthetic Captions) 8+ and LAION7+ aesthetic embeddings. This further enhances the user's creative control, providing a versatile platform for image generation.

LAION and SAC Context:

It's important to note that VisuNex builds upon existing components, SAC (Simulacra Aesthetic Captions) and LAION (LAION-Aesthetics_Predictor V1). These components, including LAION, a linear model trained on 5000 image-rating pairs sourced from the SAC dataset, are integral to creating a personalized text-to-image generation framework. SAC comprises a vast collection of over 238,000 synthetic images generated using advanced AI models, including CompVis latent GLIDE and Stable Diffusion.

In VisuNex, CLIP Image embeddings generated by the OpenAI CLIP VIT L 14 model serve as inputs, allowing for a deeper understanding of user aesthetics and resulting in a highly personalized text-to-image generation experience.

VisuNex represents a significant leap in text-to-image generation, offering a versatile, creative, and user-centric approach. It goes beyond traditional text-based descriptions, enabling users to define their aesthetic language for image creation, ultimately delivering a personalized and remarkable image synthesis experience.

Usage

You can use the same arguments as with the original stable diffusion repository. The script scripts/txt2img.py has the additional arguments:

  • --aesthetic_steps: number of optimization steps when doing the personalization. For a given prompt, it is recommended to start with few steps (2 or 3), and then gradually increase it (trying 5, 10, 15, 20, etc). The greater the value, the more the resulting image will be biased towards the aesthetic embedding.
  • --aesthetic_lr: learning rate for the aesthetic gradient optimization. The default value is 0.0001. This value almost usually works well enough, so you can just only tune the previous argument.
  • --aesthetic_embedding: path to the stored pytorch tensor (.pt format) containing the aesthetic embedding. It must be of shape 1x768 (CLIP-L/14 size). See below for computing your own aesthetic embeddings.

In this repository we include all the aesthetic embeddings used in the paper. All of them are in the directory aesthetic_embeddings:

  • sac_8plus.pt
  • laion_7plus.pt
  • aivazovsky.pt
  • cloudcore.pt
  • gloomcore.pt
  • glowwave.pt

In addition, new aesthetic embeddings have been incorporated:

Examples

Let's see some examples now. This would be with the un-personalized, original SD model:

python scripts/txt2img.py --prompt "Roman city on top of a ridge, sci-fi illustration by Greg Rutkowski #sci-fi detailed vivid colors gothic concept illustration by James Gurney and Zdzislaw Beksiński vivid vivid colorsg concept illustration colorful interior" --seed 332 --plms  --aesthetic_steps 0 --W 768 --aesthetic_embedding aesthetic_embeddings/laion_7plus.pt

image

If we now personalize it with the LAION embedding, note how the images get more floral patterns, as this is one common pattern of the LAION aesthetics dataset:

python scripts/txt2img.py --prompt "Roman city on top of a ridge, sci-fi illustration by Greg Rutkowski #sci-fi detailed vivid colors gothic concept illustration by James Gurney and Zdzislaw Beksiński vivid vivid colorsg concept illustration colorful interior" --seed 332 --plms  --aesthetic_steps 5 --W 768 --aesthetic_embedding aesthetic_embeddings/laion_7plus.pt

image

Increasing the number of steps more...

python scripts/txt2img.py --prompt "Roman city on top of a ridge, sci-fi illustration by Greg Rutkowski #sci-fi detailed vivid colors gothic concept illustration by James Gurney and Zdzislaw Beksiński vivid vivid colorsg concept illustration colorful interior" --seed 332 --plms  --aesthetic_steps 8 --W 768 --aesthetic_embedding aesthetic_embeddings/laion_7plus.pt

image

Another example, this we will be using another embedding that further exacerabates the floral patterns. This is the original SD output:

python scripts/txt2img.py --prompt "Cyberpunk ikea, close up shot from the top, anime art, greg rutkowski, studio ghibli, dramatic lighting" --seed 332 --plms --ckpt ../stable-diffusion/sd-v1-4.ckpt --H 768 --aesthetic_steps 0  --aesthetic_embedding aesthetic_embeddings/flower_plant.pt

image

And this is with 20 steps with the flower_plant.pt embedding:

python scripts/txt2img.py --prompt "Cyberpunk ikea, close up shot from the top, anime art, greg rutkowski, studio ghibli, dramatic lighting" --seed 332 --plms --ckpt ../stable-diffusion/sd-v1-4.ckpt --H 768 --aesthetic_steps 20  --aesthetic_embedding aesthetic_embeddings/flower_plant.pt

image

Let's see another example:

python scripts/txt2img.py --prompt "A portal towards other dimension" --plms  --seed 332 --aesthetic_steps 15 --aesthetic_embedding aesthetic_embeddings/sac_8plus.pt

image

If we increase it to 20 steps, we get a more pronounced effect:

python scripts/txt2img.py --prompt "A portal towards other dimension" --plms  --seed 332 --aesthetic_steps 20 --aesthetic_embedding aesthetic_embeddings/sac_8plus.pt

image

We can set the steps to 0 to get the outputs for the original stable diffusion model:

python scripts/txt2img.py --prompt "A portal towards other dimension" --plms  --seed 332 --aesthetic_steps 0 --aesthetic_embedding aesthetic_embeddings/sac_8plus.pt

image

Note that since we have used the SAC dataset for the personalization, the optimized results are more biased towards fantasy aesthetics.

To see more examples, look at the Further resources section below, or have a look at https://arxiv.org/abs/2209.12330

Using your own embeddings

If you want to use your own aesthetic embeddings from a set of images, you can use the script scripts/gen_aesthetic_embedding.py. This script takes as input a directory containing images, and outputs a pytorch tensor containing the aesthetic embedding, so you can use it as in the previous commands.

Some examples with three works from the painter Aivazovsky: reference_images/aivazovsky

python scripts/txt2img.py --prompt "a painting of a tree, oil on canvas" --plms  --seed 332 --aesthetic_steps 50 --aesthetic_embedding aesthetic_embeddings/aivazovsky.pt

image

Note that just adding the modifier "by Aivazoysky" to the prompt does not work so well:

python scripts/txt2img.py --prompt "a painting of a tree, oil on canvas by Aivazovsky" --plms --seed 332 --aesthetic_steps 0 --aesthetic_embedding aesthetic_embeddings/aivazovsky.pt

image

Another example, mixing the styles of two painters (one in the prompt, the other as the aesthetic embedding):

96 python scripts/txt2img.py --prompt "a gothic cathedral in a stunning landscape by Jean-Honoré Fragonard" --plms --seed 139782398 --aesthetic_steps 12 --aesthetic_embedding aesthetic_embeddings/aivazovsky.pt

image

Whereas the original SD would output this:

python scripts/txt2img.py --prompt "a gothic cathedral in a stunning landscape by Jean-Honoré Fragonard" --plms --seed 139782398 --aesthetic_steps 0 --aesthetic_embedding aesthetic_embeddings/aivazovsky.pt

image

Using it with other fine-tuned SD models

The aesthetic gradients technique can be used with any fine-tuned SD model.

python scripts/txt2img.py --prompt "robotic cat with wings" --plms --seed 7 --ckpt ../stable-diffusion/ema-only-epoch\=000142.ckpt  --aesthetic_steps 15 --aesthetic_embedding aesthetic_embeddings/laion_7plus.pt

image

The previous prompt was personalized with the LAION aesthetics embedding, so it has more childish-like than using just the original model:

python scripts/txt2img.py --prompt "robotic cat with wings" --plms --seed 7 --ckpt ../stable-diffusion/ema-only-epoch\=000142.ckpt  --aesthetic_steps 0 --aesthetic_embedding aesthetic_embeddings/laion_7plus.pt

image

Further resources

About

VisuNex, a fork of original stable diffusion repository is an attempt to personalize text-to-image which allows users to personalize image creation based on their unique aesthetic preferences.

Topics

Resources

License

Stars

Watchers

Forks