|
1 |
| -# FQGAN |
2 |
| -This is the project page of FQGAN |
| 1 | +<div align="center"> |
| 2 | +<br> |
| 3 | +<h3>Factorized Visual Tokenization and Generation</h3> |
| 4 | + |
| 5 | +[Zechen Bai](https://www.baizechen.site/) <sup>1</sup> |
| 6 | +[Jianxiong Gao](https://jianxgao.github.io/) <sup>2</sup> |
| 7 | +[Ziteng Gao](https://sebgao.github.io/) <sup>1</sup> |
| 8 | +[Pichao Wang](https://wangpichao.github.io/) <sup>3</sup> |
| 9 | +[Zheng Zhang](https://scholar.google.com/citations?user=k0KiE4wAAAAJ&hl=en) <sup>3</sup> |
| 10 | +[Tong He](https://hetong007.github.io/) <sup>3</sup> |
| 11 | +[Mike Zheng Shou](https://sites.google.com/view/showlab) <sup>1</sup> |
| 12 | + |
| 13 | +arXiv 2024 |
| 14 | + |
| 15 | +<sup>1</sup> [Show Lab, National University of Singapore](https://sites.google.com/view/showlab/home) <sup>2</sup> Fudan University <sup>3</sup> Amazon |
| 16 | + |
| 17 | +[](https://arxiv.org/abs/2411.16681) |
| 18 | + |
| 19 | +</div> |
| 20 | + |
| 21 | +**News** |
| 22 | +* **[2024-11-28]** The code and model will be released soon after internal approval! |
| 23 | +* **[2024-11-26]** We released our paper on [arXiv](https://arxiv.org/abs/2411.16681). |
| 24 | + |
| 25 | +## TL;DR |
| 26 | +FQGAN is state-of-the-art visual tokenizer with a novel factorized tokenization design, surpassing VQ and LFQ methods in discrete image reconstruction. |
| 27 | + |
| 28 | +<p align="center"> <img src="assets/rfid_teaser.jpg" width="555"></p> |
| 29 | + |
| 30 | +## Method Overview |
| 31 | + |
| 32 | +FQGAN addresses the large codebook usage issue by decomposing a single large codebook into multiple independent sub-codebooks. |
| 33 | +By leveraging disentanglement regularization and representation learning objectives, the sub-codebooks learn hierarchical, structured and semantic meaningful representations. |
| 34 | +FQGAN achieves state-of-the-art performance on discrete image reconstruction, surpassing VQ and LFQ methods. |
| 35 | + |
| 36 | +<p align="center"> <img src="assets/framework.jpg" width="888"></p> |
| 37 | + |
| 38 | + |
| 39 | +## Comparison with previous visual tokenizers |
| 40 | +<p align="center"> <img src="assets/Tab_Tok.png" width="666"></p> |
| 41 | + |
| 42 | +## What has each sub-codebook learned? |
| 43 | +<p align="center"> <img src="assets/tsne_dual_codebook.jpg" width="666"></p> |
| 44 | + |
| 45 | +<p align="center"> <img src="assets/recon_codebook.jpg" width="666"></p> |
| 46 | + |
| 47 | +## Can this tokenizer be used into downstream image generation? |
| 48 | + |
| 49 | +<p align="center"> <img src="assets/Tab_AR.png" width="666"></p> |
| 50 | +<p align="center"> <img src="assets/AR_gen.jpg" width="888"></p> |
| 51 | + |
| 52 | +## Citation |
| 53 | +To cite the paper and model, please use the below: |
| 54 | +``` |
| 55 | +@article{bai2024factorized, |
| 56 | + title={Factorized Visual Tokenization and Generation}, |
| 57 | + author={Bai, Zechen and Gao, Jianxiong and Gao, Ziteng and Wang, Pichao and Zhang, Zheng and He, Tong and Shou, Mike Zheng}, |
| 58 | + journal={arXiv preprint arXiv:2411.16681}, |
| 59 | + year={2024} |
| 60 | +} |
| 61 | +``` |
0 commit comments