Cheng Lei12†, Jiayu Zhang2†‡, Yue Ma3*, Xinyu Wang4, Long Chen2, Liang Tang2, Yiqiang Yan2, Fei Su1, Zhicheng Zhao1*
1 Beijing University of Posts and Telecommunications, 2 Lenovo, 3 HKUST, 4 Tsinghua University
†Equal Contribution ‡ Project Lead *Corresponding Author
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
For more examples, please refer to our project page (https://xduzhangjiayu.github.io/DiTraj_Project_Page/).
[2025.9.29] Paper released!
[2025.12.10] Code released!
- Release Paper on arxiv
- Release Code
- Release Gradio demo with user-friendly interaction
Clone the repo:
git clone https://github.com/xduzhangjiayu/DiTraj.git
Then:
conda create --name DiTraj python=3.11
conda activate DiTraj
pip install -r requirements.txt
git clone --branch v0.33.1 https://github.com/huggingface/diffusers.git
cd diffusers
pip install -e .
Finally:
Replace the ./module/transformer_wan.py file in the ./diffusers/src/diffusers/models/transformers/transformer_wan.py
- First, input your prompts in the
test_prompts.txt. - Then, run the following command:
python prompt_extend.py (optional)
python prompt_refine.py
demo/test_prompts_refined.json will be generated, including the bg/fg prompt.
- Define your trajectory in
run.py(line 15) You can set the bbox in several keyframes , (x1,y1) is the bbox top left corner, (x2,y2) is the bottom right corner. Each keyframe uses [frame_id, y1, y2, x1, x2] For example:
bboxs = [
[0, 0.3, 0.7, 0.1, 0.4], # frame 0: Left side
[80, 0.3, 0.7, 0.7,1.0] # frame 80: Right side
]
if you want to use a complex trajectory, you can use the following code:
bboxs = [
[0, 0.05, 0.55, 0.05, 0.45], # frame 0: Top-left
[20, 0.05, 0.55, 0.55, 0.95], # frame 20: Top-right
[40, 0.45, 0.95, 0.55, 0.95], # frame 40: Bottom-left
[60, 0.45, 0.95, 0.05, 0.45], # frame 60: Bottom-right
[80, 0.05, 0.55, 0.05, 0.45], # frame 80: Top-left
]
- Run the following command:
python run.py
- The video will be saved in the in
demo/output.mp4anddemo/output_box.mp4(video with bbox)
Our codebase builds on diffusers, thanks for the great work!
If you find our work helpful, please star 🌟 this repo and cite 📑 our paper. Thanks for your support!
@misc{lei2025ditrajtrainingfreetrajectorycontrol,
title={DiTraj: training-free trajectory control for video diffusion transformer},
author={Cheng Lei and Jiayu Zhang and Yue Ma and Xinyu Wang and Long Chen and Liang Tang and Yiqiang Yan and Fei Su and Zhicheng Zhao},
year={2025},
eprint={2509.21839},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2509.21839},
}
This code is licensed under CC BY-NC 4.0 and intended for research use only — no commercial use allowed.














