Junfeng Wu1,2 · Yi Jiang2† · Chuofan Ma2,3
Yuliang Liu1 · Hengshuang Zhao3
Zehuan Yuan2 · Song Bai2* · Xiang Bai1*
1HUST 2ByteDance 3HKU
†project lead *corresponding author
This repo implements Liquid, a scalable and unified autoregressive generation paradigm that seamlessly integrates multimodal comprehension and generation.
2025-02-28: Paper, demo, model, and project page for Liquid are all released.
- Liquid-7B (Mix-pretrained Multimodal Model with T2I and Language Ability)
- Web Demo
- Inference
- Checkpoints
- Liquid-7B-Multiratio (Multi-Ratio Image Generation Model)
- Web Demo
- Inference
- Checkpoints
- Liquid-7B-IT (Instruction Tuned Multimodal Model with Instruction Following Ability)
- Web Demo
- Inference
- Checkpoints
-
We present Liquid, an auto-regressive generation paradigm that seamlessly integrates visual comprehension and generation.
-
Unlike previous multimodal large language model (MLLM), Liquid achieves this integration using a single large language model (LLM), eliminating the need for external pretrained visual embeddings such as CLIP.
-
For the first time, Liquid uncovers a scaling law that performance drop unavoidably brought by the unified training of visual and language tasks diminishes as the model size increases.
-
Furthermore, the unified token space enables visual generation and comprehension tasks to mutually enhance each other
- Liquid : Scalable and Versatile Unified Multimodal Generator which supports Visual Understanding, Visual Generation and Multi-modal Generation
- Liquid can generate high-quality, photorealistic images of any aspect ratio by language in an autoregressive paradigm.
- Liquid shows clear Scaling Law in multimodal generation across different sizes(0.5B to 32B).
This project is licensed under the MIT License - see the LICENSE file for details.
If you find this project useful, please consider citing:
@article{liquid,
title={Liquid: Language Models are Scalable and Unified Multi-modal Generators},
author=author={Wu, Junfeng and Jiang, Yi and Ma, Chuofan and Liu, Yuliang and Zhao, Hengshuang and Yuan, Zehuan and Bai, Song and Bai, Xiang},
journal={arXiv preprint arXiv:2412.04332},
year={2024}
}