GitHub - nvidia-cosmos/cosmos-predict1: Cosmos-Predict1 is a collection of general-purpose world foundation models for Physical AI that can be fine-tuned into customized world models for downstream applications.

Product Website | Hugging Face | Paper | Paper Website

Cosmos-Predict1 is a key branch of Cosmos World Foundation Models (WFMs) specialized for future state prediction, often referred to as world models. The tree main branches of Cosmos WFMs are cosmos-predict, cosmos-transfer, and cosmos-reason. We visualize the architecture of Cosmos-Predict1 in the following figure.

Cosmos-Predict1 includes the following:

Diffusion-based world foundation models for Text2World and Video2World generation, where a user can generate visual simulation based on text prompts and video prompts.
Autoregressive-based world foundation models for Video2World generation, where a user can generate visual simulation based on video prompts and optional text prompts.
Image and video tokenizers for tokenizing videos into continuous tokens (latent vectors) and discrete tokens (integers) efficiently and effectively.
Post-training scripts for helping Physical AI builders post-train pre-trained Cosmos-Predict1 for their applications.

Example Model Behavior

Cosmos-Predict Text2World

428228630-b001966c-5f5e-4927-a3fe-44d142dd0ab1.mp4

Cosmos-Predict Video2World

428228629-0bbba982-c6fd-4388-a46f-bf91ce4099ad.mp4

Getting Started

We provide a comphrehensive set of examples to illustrate how to perform inference, post-training, etc, with Cosmos-Predict1. Click a relevant example below and start your Cosmos journey.

Installation

Please refer to INSTALL.md for general instructions on environment setup.

Cosmos-Predict1 Models

Cosmos-Predict1 include the following models

Diffusion models

Cosmos-Predict1-7B-Text2World: Text to visual world generation
Cosmos-Predict1-14B-Text2World: Text to visual world generation
Cosmos-Predict1-7B-Video2World: Video + Text based future visual world generation
Cosmos-Predict1-14B-Video2World: Video + Text based future visual world generation

Autoregressive models

Cosmos-Predict1-4B: Future visual world generation
Cosmos-Predict1-12B: Future visual world generation
Cosmos-Predict1-5B-Video2World: Video + Text based future visual world generation
Cosmos-Predict1-13B-Video2World: Video + Text based future visual world generation

Tokenizers

Cosmos-Tokenize1-CV8×8×8-720p: Continuous Video Tokenizer with 8x8x8 spatio-temporal compression with, 121 frames context
Cosmos-Tokenize1-DV8×16×16-720p: Discrete Video Tokenizer with 8x16x16 spatio-temporal compression, and 49 frames context
Cosmos-Tokenize1-CI8×8-360p: Continuous Image Tokenizer with 8x8 spatial compression with low-resolution support
Cosmos-Tokenize1-CI16x16-360p: Continuous Image Tokenizer with 16x16 spatial compression with low-resolution support
Cosmos-Tokenize1-CV4×8×8-360p: Continuous Video Tokenizer with 4x8x8 spatio-temporal compression with low-resolution support
Cosmos-Tokenize1-DI8×8-360p: Discrete Image Tokenizer with 8x8 spatial compression with low-resolution support
Cosmos-Tokenize1-DI16x16-360p: Discrete Image Tokenizer with 16x16 spatial compression with low-resolution support
Cosmos-Tokenize1-DV4×8×8-360p: Discrete Video Tokenizer with 4x8x8 spatio-temporal compression with low-resolution support

License and Contact

This project will download and install additional third-party open source software projects. Review the license terms of these open source projects before use.

This model includes safety and content moderation features powered by Llama Guard 3. Llama Guard 3 is used solely as a content input filter and is subject to its own license.

NVIDIA Cosmos source code is released under the Apache 2 License.

NVIDIA Cosmos models are released under the NVIDIA Open Model License. For a custom license, please contact [email protected].

Name		Name	Last commit message	Last commit date
Latest commit History 98 Commits
.github/workflows		.github/workflows
assets		assets
checkpoints		checkpoints
cosmos_predict1		cosmos_predict1
datasets		datasets
examples		examples
scripts		scripts
.flake8		.flake8
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
ATTRIBUTIONS.md		ATTRIBUTIONS.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
INSTALL.md		INSTALL.md
LICENSE		LICENSE
README.md		README.md
cosmos-predict1.yaml		cosmos-predict1.yaml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Product Website | Hugging Face | Paper | Paper Website

Example Model Behavior

Getting Started

Installation

Inference with pre-trained Cosmos-Predict1 models

Post-train pre-trained Cosmos-Predict1 models

Inference with post-trained models:

Cosmos-Predict1 Models

License and Contact

About

Releases

Packages

Contributors 14

Languages

License

nvidia-cosmos/cosmos-predict1

Folders and files

Latest commit

History

Repository files navigation

Product Website | Hugging Face | Paper | Paper Website

Example Model Behavior

Getting Started

Installation

Inference with pre-trained Cosmos-Predict1 models

Post-train pre-trained Cosmos-Predict1 models

Inference with post-trained models:

Cosmos-Predict1 Models

License and Contact

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 14

Languages

Packages