Skip to content

Ma-Zhuang/OmniNWM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

38 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

OmniNWM

Omniscient Driving Navigation World Models

Paper Project Page Huggingface

Bohan Li, Zhuang Ma, Dalong Du, Baorui Peng, Zhujin Liang, Zhenqiang Liu, Chao Ma, Yueming Jin, Hao Zhao, Wenjun Zeng, Xin Jin

OmniNWM is a unified panoramic navigation world model that advances autonomous driving simulation by jointly generating multi-modal states (RGB, semantics, depth, 3D occupancy), enabling precise action control via normalized Plücker ray-maps, and facilitating closed-loop evaluation through occupancy-based dense rewards.


Teaser


✨ Key Features

Feature Description
Multi-modal Generation Jointly generates RGB, semantic, depth, and 3D occupancy in panoramic views
Precise Camera Control Normalized Plücker ray-maps for pixel-level trajectory interpretation
Long-term Stability Flexible forcing strategy enables auto-regressive generation beyond GT length
Closed-loop Evaluation Occupancy-based dense rewards enable realistic driving policy evaluation
Zero-shot Generalization Transfers across datasets and camera configurations without fine-tuning

🏗️ Architecture

Architecture


🛠️ Quickstart

1. Installation

Manual Patch Required: After installation, you must manually patch transformers for compatibility. See step 4 below.

  1. Clone the repository

    git clone https://github.com/Ma-Zhuang/OmniNWM.git
    cd OmniNWM
  2. Create directories

    mkdir -p pretrained data
  3. Install dependencies (Recommended: torch >= 2.4.0)

    pip install -v -e .
    pip install "huggingface_hub[cli]"
  4. Apply Patch Locate transformers/modeling_utils.py (usually in your conda env site-packages) and modify the version check:

    # Find this line:
    if self._tp_plan is not None and is_torch_greater_or_equal("2.3"):
    
    # Change to:
    if self._tp_plan is not None and is_torch_greater_or_equal("2.5"):

2. Model Download

Download the official checkpoints and auxiliary models.

OmniNWM Weights:

pip install "huggingface_hub[cli]"
huggingface-cli download Arlolo0/OmniNWM --local-dir ./pretrained

Open-Sora-v2 Weights:

huggingface-cli download hpcai-tech/Open-Sora-v2 --local-dir ./pretrained

3. Data Preparation

  1. nuScenes Dataset: Download Trainval splits (Full dataset v1.0) from the official website and place in ./data/nuscenes.
  2. Depth Annotations: Download from HuggingFace.
  3. Segmentation Annotations: Download from HuggingFace.

Expected Directory Structure:

OmniNWM
├── assets
├── build
├── configs
├── data
│   ├── nuscenes
│   │   ├── samples
│   │   │   ├── CAM_BACK
│   │   │   ├── CAM_BACK_LEFT
│   │   │   ├── ...
│   │   │   ├── CAM_FRONT_RIGHT
│   │   ├── sweeps
│   │   │   ├── CAM_BACK
│   │   │   ├── CAM_BACK_LEFT
│   │   │   ├── ...
│   │   │   ├── CAM_FRONT_RIGHT
│   ├── nuscenes_12hz_depth_unzip
│   │   ├── adf04...
│   │   ├── adf06...
│   │   ├── ...
│   │   ├── ecd00...
│   ├── nuscenes_seg
│   │   ├── samples_seg
│   │   │   ├── CAM_BACK
│   │   │   ├── CAM_BACK_LEFT
│   │   │   ├── ...
│   │   │   ├── CAM_FRONT_RIGHT
│   │   ├── sweeps_seg
│   │   │   ├── CAM_BACK
│   │   │   ├── CAM_BACK_LEFT
│   │   │   ├── ...
│   │   │   ├── CAM_FRONT_RIGHT
│   ├── nuscenes_interp_12Hz_infos_train_with_bid_caption.pkl
│   ├── nuscenes_interp_12Hz_infos_val_with_bid_caption.pkl
├── omninwm
├── pretrained
│   ├── hunyuan_vae.safetensors
│   ├── occ.pth
│   ├── Open_Sora_v2.safetensors
├── tools

4. OmniNWM-VLA

Check OmniNWM-VLA for more implementation details, including :

  • Integrated Tri-MMI for tri-modal fusion
  • ShareGPT format dataset generation pipeline
  • codebase setup with nuScenes support

🚀 Usage

Inference (Trajectory-to-Video)

Generate videos from trajectories. Ensure you update the checkpoint path in configs/inference/infer.py before running.

Task Command Description
Standard Inference torchrun --nproc-per-node 8 tools/inference.py configs/inference/infer.py Multi-GPU, nuScenes 448x800, 6 cams, 33 frames
OOD Nuplan Inference torchrun --nproc-per-node 8 tools/inference.py configs/inference/infer_nuplan.py nuPlan dataset, manual trajectory input
VLA Closed-Loop Test torchrun --nproc-per-node 8 tools/inference.py configs/inference/infer_with_occ_vla.py Closed-loop test with occupancy prediction (321 frames)

Training

Training is divided into stages for stability.

# Stage 1: Small resolution, short video, single model output
bash dist_train_mlp.sh configs/train/stage_1.py

# Stage 2: Small resolution, short video, multi-model output
bash dist_train_mlp.sh configs/train/stage_2.py

# Stage 3: High resolution, long/short video, multi-model output
bash dist_train_mlp.sh configs/train/stage_3.py

📚 Citation

If you find OmniNWM useful for your research, please consider citing:

@article{li2025omninwm,
  title={OmniNWM: Omniscient Driving Navigation World Models},
  author={Li, Bohan and Ma, Zhuang and Du, Dalong and Peng, Baorui and Liang, Zhujin and Liu, Zhenqiang and Ma, Chao and Jin, Yueming and Zhao, Hao and Zeng, Wenjun and others},
  journal={arXiv preprint arXiv:2510.18313},
  year={2025}
}

📄 License

This project is licensed under the Apache License 2.0. See the LICENSE file for details.

❤️ Acknowledgments

Built upon excellent open-source projects including OpenSora and Qwen-VL.

🌟 Star us on GitHub if you like this project! 🌟

About

OmniNWM: Omniscient Navigation World Models for Autonomous Driving

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors