- [2024/12/10] 🔥 RCDMs is accepted by AAAI 2025.
- [2024/08/08] 🔥 We release the train and test code of RCDMs.
- [2024/07/02] 🔥 We release the paper of RCDMs for story generation.
Recent research showcases the considerable potential of conditional diffusion models for generating consistent stories. However, current methods, which predominantly generate stories in an autoregressive and excessively caption-dependent manner, often underrate the contextual consistency and relevance of frames during sequential generation. To address this, we propose a novel Rich-contextual Conditional Diffusion Models (RCDMs), a two-stage approach designed to enhance story generation’s semantic consistency and temporal consistency. Specifically, in the first stage, the frame-prior transformer diffusion model is presented to predict the frame semantic embedding of the unknown clip by aligning the semantic correlations between the captions and frames of the known clip. The second stage establishes a robust model with rich contextual conditions, including reference images of the known clip, the predicted frame semantic embedding of the unknown clip, and text embeddings of all captions. By jointly injecting these rich contextual conditions at the image and feature levels, RCDMs can generate semantic and temporal consistency stories. Moreover, RCDMs can generate consistent stories with a single forward inference compared to autoregressive models. Our qualitative and quantitative results demonstrate that our proposed RCDMs outperform in challenging scenarios.
Story visualization aims to depict a continuous narrative through multiple captions and reference clips. It has profound applications in game development and comic drawing. Due to the technological leaps in generative models, text-to-image synthesis methods can now generate visually faithful images through text descriptions. However, generating a continuous story with style and temporal consistency still poses significant challenges. Our proposed Rich-contextual Conditional Diffusion Models (RCDMs) tackle these issues by introducing a two-stage diffusion model framework that incorporates rich contextual information at both the image and feature levels.
Dataset preparation follows the workflow outlined in ARLDM.
- Python >= 3.8 (Recommend to use Anaconda or Miniconda)
- PyTorch >= 2.0.0
- cuda==11.8
conda create --name rcdms python=3.8.10
conda activate rcdms
pip install -U pip
# Install requirements
pip install -r requirements.txt# stage1
sh run_stage1_PororoSV.sh or sh run_stage1_FlintstonesSV.sh
# stage2
sh run_stage2_PororoSV.sh or sh run_stage2_FlintstonesSV.sh# stage1
python3 stage1_batchtest_rcdms_model.py
# stage2
python3 stage2_batchtest_rcdms_model.pyIf you find RCDMs useful for your research and applications, please cite using this BibTeX:
@article{shen2024boosting,
title={Boosting Consistency in Story Visualization with Rich-Contextual Conditional Diffusion Models},
author={Shen, Fei and Ye, Hu and Liu, Sibo and Zhang, Jun and Wang, Cong and Han, Xiao and Yang, Wei},
journal={arXiv preprint arXiv:2407.02482},
year={2024}
}
- IMAGEdit: Training-Free Controllable Video Editing with Consistent Object Layout. [可控多目标视频编辑]
- IMAGDressing: Controllable dressing generation. [可控穿衣生成]
- IMAGGarment: Fine-grained controllable garment generation. [可控服装生成]
- IMAGHarmony: Controllable image editing with consistent object layout. [可控多目标图像编辑]
- IMAGPose: Pose-guided person generation with high fidelity. [可控多模式人物生成]
- RCDMs: Rich-contextual conditional diffusion for story visualization. [可控故事生成]
- PCDMs: Progressive conditional diffusion for pose-guided image synthesis. [可控人物生成]
- V-Express: Explores strong and weak conditional relationships for portrait video generation. [可控数字人生成]
- FaceShot: Talkingface plugin for any character. [可控动漫数字人生成]
- CharacterShot: Controllable and consistent 4D character animation framework. [可控4D角色生成]
- StyleTailor: An Agent for personalized fashion styling. [个性化时尚Agent]
- SignVip: Controllable sign language video generation. [可控手语生成]
If you have any questions, please feel free to contact with me at [email protected].



