Skip to content

[ICLR 2025] Official Implementation of Meissonic: Revitalizing Masked Generative Transformers for Efficient High-Resolution Text-to-Image Synthesis

License

Notifications You must be signed in to change notification settings

viiika/Meissonic

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Mar 20, 2025
07ca024 ยท Mar 20, 2025

History

67 Commits
Nov 11, 2024
Nov 14, 2024
Oct 19, 2024
Oct 19, 2024
Mar 20, 2025
Mar 20, 2025
Oct 28, 2024
Oct 13, 2024
Mar 20, 2025
Oct 18, 2024
Nov 4, 2024
Oct 23, 2024
Oct 20, 2024
Oct 17, 2024
Nov 8, 2024
Nov 4, 2024
Oct 19, 2024
Oct 16, 2024
Oct 20, 2024
Nov 11, 2024

Repository files navigation

Meissonic: Revitalizing Masked Generative Transformers for Efficient High-Resolution Text-to-Image Synthesis

Meissonic Banner

arXiv Hugging Face GitHub YouTube YouTube Demo Replicate

Hugging Face Demo

arXiv

arXiv

๐Ÿ“ Meissonic Updates and Family Papers

Meissonic Demos

๐Ÿš€ Introduction

Meissonic is a non-autoregressive mask image modeling text-to-image synthesis model that can generate high-resolution images. It is designed to run on consumer graphics cards.

Architecture

Key Features:

  • ๐Ÿ–ผ๏ธ High-resolution image generation (up to 1024x1024)
  • ๐Ÿ’ป Designed to run on consumer GPUs
  • ๐ŸŽจ Versatile applications: text-to-image, image-to-image

๐Ÿ› ๏ธ Prerequisites

Step 1: Clone the repository

git clone https://github.com/viiika/Meissonic/
cd Meissonic

Step 2: Create virtual environment

conda create --name meissonic python
conda activate meissonic
pip install -r requirements.txt

Step 3: Install diffusers

git clone https://github.com/huggingface/diffusers.git
cd diffusers
pip install -e .

๐Ÿ’ก Inference Usage

Gradio Web UI

python app.py

Command-line Interface

Text-to-Image Generation

python inference.py --prompt "Your creative prompt here"

Inpainting and Outpainting

python inpaint.py --mode inpaint --input_image path/to/image.jpg
python inpaint.py --mode outpaint --input_image path/to/image.jpg

Advanced: FP8 Quantization

Optimize performance with FP8 quantization:

Requirements:

  • CUDA 12.4
  • PyTorch 2.4.1
  • TorchAO

Note: Windows users install TorchAO using

pip install --pre torchao --index-url https://download.pytorch.org/whl/nightly/cpu

Command-line inference

python inference_fp8.py --quantization fp8

Gradio for FP8 (Select Quantization Method in Advanced settings)

python app_fp8.py

Performance Benchmarks

Precision (Steps=64, Resolution=1024x1024) Batch Size=1 (Avg. Time) Memory Usage
FP32 13.32s 12GB
FP16 12.35s 9.5GB
FP8 12.93s 8.7GB

๐ŸŽจ Showcase

A pillow with a picture of a Husky on it.

"A pillow with a picture of a Husky on it."

A white coffee mug, a solid black background

"A white coffee mug, a solid black background"

๐ŸŽ“ Training

To train Meissonic, follow these steps:

  1. Install dependencies:

    cd train
    pip install -r requirements.txt
  2. Download the Meissonic base model from Hugging Face.

  3. Prepare your dataset:

  4. Start training:

    bash train/train.sh

Note: For custom datasets, you'll likely need to implement your own dataset class.

๐Ÿ“š Citation

If you find this work helpful, please consider citing:

@article{bai2024meissonic,
  title={Meissonic: Revitalizing Masked Generative Transformers for Efficient High-Resolution Text-to-Image Synthesis},
  author={Bai, Jinbin and Ye, Tian and Chow, Wei and Song, Enxin and Chen, Qing-Guo and Li, Xiangtai and Dong, Zhen and Zhu, Lei and Yan, Shuicheng},
  journal={arXiv preprint arXiv:2410.08261},
  year={2024}
}

๐Ÿ™ Acknowledgements

We thank the community and contributors for their invaluable support in developing Meissonic. We thank apolinario@multimodal.art for making Meissonic Demo. We thank @NewGenAI and @้ฃ›้ทนใ—ใšใ‹@่‡ช็งฐๆ–‡็ณปใƒ—ใƒญใ‚ฐใƒฉใƒžใฎๅ‹‰ๅผท for making YouTube tutorials. We thank @pprp for making fp8 and int4 quantization. We thank @camenduru for making jupyter tutorial. We thank @chenxwh for making Replicate demo and api. We thank Collov Labs for reproducing Monetico. We thank Shitong et al. for identifying effective design choices for enhancing visual quality.


Star History Chart

Made with โค๏ธ by the MeissonFlow Research

About

[ICLR 2025] Official Implementation of Meissonic: Revitalizing Masked Generative Transformers for Efficient High-Resolution Text-to-Image Synthesis

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published