Omniscient Driving Navigation World Models
Bohan Li, Zhuang Ma, Dalong Du, Baorui Peng, Zhujin Liang, Zhenqiang Liu, Chao Ma, Yueming Jin, Hao Zhao, Wenjun Zeng, Xin Jin
OmniNWM is a unified panoramic navigation world model that advances autonomous driving simulation by jointly generating multi-modal states (RGB, semantics, depth, 3D occupancy), enabling precise action control via normalized Plücker ray-maps, and facilitating closed-loop evaluation through occupancy-based dense rewards.
| Feature | Description |
|---|---|
| Multi-modal Generation | Jointly generates RGB, semantic, depth, and 3D occupancy in panoramic views |
| Precise Camera Control | Normalized Plücker ray-maps for pixel-level trajectory interpretation |
| Long-term Stability | Flexible forcing strategy enables auto-regressive generation beyond GT length |
| Closed-loop Evaluation | Occupancy-based dense rewards enable realistic driving policy evaluation |
| Zero-shot Generalization | Transfers across datasets and camera configurations without fine-tuning |
Manual Patch Required: After installation, you must manually patch
transformersfor compatibility. See step 4 below.
-
Clone the repository
git clone https://github.com/Ma-Zhuang/OmniNWM.git cd OmniNWM -
Create directories
mkdir -p pretrained data
-
Install dependencies (Recommended:
torch >= 2.4.0)pip install -v -e . pip install "huggingface_hub[cli]"
-
Apply Patch Locate
transformers/modeling_utils.py(usually in your conda envsite-packages) and modify the version check:# Find this line: if self._tp_plan is not None and is_torch_greater_or_equal("2.3"): # Change to: if self._tp_plan is not None and is_torch_greater_or_equal("2.5"):
Download the official checkpoints and auxiliary models.
OmniNWM Weights:
pip install "huggingface_hub[cli]"
huggingface-cli download Arlolo0/OmniNWM --local-dir ./pretrainedOpen-Sora-v2 Weights:
huggingface-cli download hpcai-tech/Open-Sora-v2 --local-dir ./pretrained- nuScenes Dataset: Download Trainval splits (Full dataset v1.0) from the official website and place in
./data/nuscenes. - Depth Annotations: Download from HuggingFace.
- Segmentation Annotations: Download from HuggingFace.
Expected Directory Structure:
OmniNWM
├── assets
├── build
├── configs
├── data
│ ├── nuscenes
│ │ ├── samples
│ │ │ ├── CAM_BACK
│ │ │ ├── CAM_BACK_LEFT
│ │ │ ├── ...
│ │ │ ├── CAM_FRONT_RIGHT
│ │ ├── sweeps
│ │ │ ├── CAM_BACK
│ │ │ ├── CAM_BACK_LEFT
│ │ │ ├── ...
│ │ │ ├── CAM_FRONT_RIGHT
│ ├── nuscenes_12hz_depth_unzip
│ │ ├── adf04...
│ │ ├── adf06...
│ │ ├── ...
│ │ ├── ecd00...
│ ├── nuscenes_seg
│ │ ├── samples_seg
│ │ │ ├── CAM_BACK
│ │ │ ├── CAM_BACK_LEFT
│ │ │ ├── ...
│ │ │ ├── CAM_FRONT_RIGHT
│ │ ├── sweeps_seg
│ │ │ ├── CAM_BACK
│ │ │ ├── CAM_BACK_LEFT
│ │ │ ├── ...
│ │ │ ├── CAM_FRONT_RIGHT
│ ├── nuscenes_interp_12Hz_infos_train_with_bid_caption.pkl
│ ├── nuscenes_interp_12Hz_infos_val_with_bid_caption.pkl
├── omninwm
├── pretrained
│ ├── hunyuan_vae.safetensors
│ ├── occ.pth
│ ├── Open_Sora_v2.safetensors
├── tools
Check OmniNWM-VLA for more implementation details, including :
- Integrated Tri-MMI for tri-modal fusion
- ShareGPT format dataset generation pipeline
- codebase setup with nuScenes support
Generate videos from trajectories. Ensure you update the checkpoint path in configs/inference/infer.py before running.
| Task | Command | Description |
|---|---|---|
| Standard Inference | torchrun --nproc-per-node 8 tools/inference.py configs/inference/infer.py |
Multi-GPU, nuScenes 448x800, 6 cams, 33 frames |
| OOD Nuplan Inference | torchrun --nproc-per-node 8 tools/inference.py configs/inference/infer_nuplan.py |
nuPlan dataset, manual trajectory input |
| VLA Closed-Loop Test | torchrun --nproc-per-node 8 tools/inference.py configs/inference/infer_with_occ_vla.py |
Closed-loop test with occupancy prediction (321 frames) |
Training is divided into stages for stability.
# Stage 1: Small resolution, short video, single model output
bash dist_train_mlp.sh configs/train/stage_1.py
# Stage 2: Small resolution, short video, multi-model output
bash dist_train_mlp.sh configs/train/stage_2.py
# Stage 3: High resolution, long/short video, multi-model output
bash dist_train_mlp.sh configs/train/stage_3.pyIf you find OmniNWM useful for your research, please consider citing:
@article{li2025omninwm,
title={OmniNWM: Omniscient Driving Navigation World Models},
author={Li, Bohan and Ma, Zhuang and Du, Dalong and Peng, Baorui and Liang, Zhujin and Liu, Zhenqiang and Ma, Chao and Jin, Yueming and Zhao, Hao and Zeng, Wenjun and others},
journal={arXiv preprint arXiv:2510.18313},
year={2025}
}This project is licensed under the Apache License 2.0. See the LICENSE file for details.
Built upon excellent open-source projects including OpenSora and Qwen-VL.
🌟 Star us on GitHub if you like this project! 🌟

