OmniNWM

Omniscient Driving Navigation World Models

Bohan Li, Zhuang Ma, Dalong Du, Baorui Peng, Zhujin Liang, Zhenqiang Liu, Chao Ma, Yueming Jin, Hao Zhao, Wenjun Zeng, Xin Jin

OmniNWM is a unified panoramic navigation world model that advances autonomous driving simulation by jointly generating multi-modal states (RGB, semantics, depth, 3D occupancy), enabling precise action control via normalized Plücker ray-maps, and facilitating closed-loop evaluation through occupancy-based dense rewards.

✨ Key Features

Feature	Description
Multi-modal Generation	Jointly generates RGB, semantic, depth, and 3D occupancy in panoramic views
Precise Camera Control	Normalized Plücker ray-maps for pixel-level trajectory interpretation
Long-term Stability	Flexible forcing strategy enables auto-regressive generation beyond GT length
Closed-loop Evaluation	Occupancy-based dense rewards enable realistic driving policy evaluation
Zero-shot Generalization	Transfers across datasets and camera configurations without fine-tuning

🏗️ Architecture

🛠️ Quickstart

1. Installation

Manual Patch Required: After installation, you must manually patch transformers for compatibility. See step 4 below.

Clone the repository

git clone https://github.com/Ma-Zhuang/OmniNWM.git
cd OmniNWM

Create directories
```
mkdir -p pretrained data
```

Install dependencies (Recommended: torch >= 2.4.0)

pip install -v -e .
pip install "huggingface_hub[cli]"

Apply Patch Locate transformers/modeling_utils.py (usually in your conda env site-packages) and modify the version check:

# Find this line:
if self._tp_plan is not None and is_torch_greater_or_equal("2.3"):

# Change to:
if self._tp_plan is not None and is_torch_greater_or_equal("2.5"):

2. Model Download

Download the official checkpoints and auxiliary models.

OmniNWM Weights:

pip install "huggingface_hub[cli]"
huggingface-cli download Arlolo0/OmniNWM --local-dir ./pretrained

Open-Sora-v2 Weights:

huggingface-cli download hpcai-tech/Open-Sora-v2 --local-dir ./pretrained

3. Data Preparation

nuScenes Dataset: Download Trainval splits (Full dataset v1.0) from the official website and place in ./data/nuscenes.
Depth Annotations: Download from HuggingFace.
Segmentation Annotations: Download from HuggingFace.

Expected Directory Structure:

OmniNWM
├── assets
├── build
├── configs
├── data
│   ├── nuscenes
│   │   ├── samples
│   │   │   ├── CAM_BACK
│   │   │   ├── CAM_BACK_LEFT
│   │   │   ├── ...
│   │   │   ├── CAM_FRONT_RIGHT
│   │   ├── sweeps
│   │   │   ├── CAM_BACK
│   │   │   ├── CAM_BACK_LEFT
│   │   │   ├── ...
│   │   │   ├── CAM_FRONT_RIGHT
│   ├── nuscenes_12hz_depth_unzip
│   │   ├── adf04...
│   │   ├── adf06...
│   │   ├── ...
│   │   ├── ecd00...
│   ├── nuscenes_seg
│   │   ├── samples_seg
│   │   │   ├── CAM_BACK
│   │   │   ├── CAM_BACK_LEFT
│   │   │   ├── ...
│   │   │   ├── CAM_FRONT_RIGHT
│   │   ├── sweeps_seg
│   │   │   ├── CAM_BACK
│   │   │   ├── CAM_BACK_LEFT
│   │   │   ├── ...
│   │   │   ├── CAM_FRONT_RIGHT
│   ├── nuscenes_interp_12Hz_infos_train_with_bid_caption.pkl
│   ├── nuscenes_interp_12Hz_infos_val_with_bid_caption.pkl
├── omninwm
├── pretrained
│   ├── hunyuan_vae.safetensors
│   ├── occ.pth
│   ├── Open_Sora_v2.safetensors
├── tools

4. OmniNWM-VLA

Check OmniNWM-VLA for more implementation details, including :

Integrated Tri-MMI for tri-modal fusion
ShareGPT format dataset generation pipeline
codebase setup with nuScenes support

🚀 Usage

Inference (Trajectory-to-Video)

Generate videos from trajectories. Ensure you update the checkpoint path in configs/inference/infer.py before running.

Task	Command	Description
Standard Inference	`torchrun --nproc-per-node 8 tools/inference.py configs/inference/infer.py`	Multi-GPU, nuScenes 448x800, 6 cams, 33 frames
OOD Nuplan Inference	`torchrun --nproc-per-node 8 tools/inference.py configs/inference/infer_nuplan.py`	nuPlan dataset, manual trajectory input
VLA Closed-Loop Test	`torchrun --nproc-per-node 8 tools/inference.py configs/inference/infer_with_occ_vla.py`	Closed-loop test with occupancy prediction (321 frames)

Training

Training is divided into stages for stability.

# Stage 1: Small resolution, short video, single model output
bash dist_train_mlp.sh configs/train/stage_1.py

# Stage 2: Small resolution, short video, multi-model output
bash dist_train_mlp.sh configs/train/stage_2.py

# Stage 3: High resolution, long/short video, multi-model output
bash dist_train_mlp.sh configs/train/stage_3.py

📚 Citation

If you find OmniNWM useful for your research, please consider citing:

@article{li2025omninwm,
  title={OmniNWM: Omniscient Driving Navigation World Models},
  author={Li, Bohan and Ma, Zhuang and Du, Dalong and Peng, Baorui and Liang, Zhujin and Liu, Zhenqiang and Ma, Chao and Jin, Yueming and Zhao, Hao and Zeng, Wenjun and others},
  journal={arXiv preprint arXiv:2510.18313},
  year={2025}
}

📄 License

This project is licensed under the Apache License 2.0. See the LICENSE file for details.

❤️ Acknowledgments

Built upon excellent open-source projects including OpenSora and Qwen-VL.

🌟 Star us on GitHub if you like this project! 🌟

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OmniNWM

✨ Key Features

🏗️ Architecture

🛠️ Quickstart

1. Installation

2. Model Download

3. Data Preparation

4. OmniNWM-VLA

🚀 Usage

Inference (Trajectory-to-Video)

Training

📚 Citation

📄 License

❤️ Acknowledgments

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

OmniNWM

✨ Key Features

🏗️ Architecture

🛠️ Quickstart

1. Installation

2. Model Download

3. Data Preparation

4. OmniNWM-VLA

🚀 Usage

Inference (Trajectory-to-Video)

Training

📚 Citation

📄 License

❤️ Acknowledgments