DDMI: Domain-Agnostic Latent Diffusion Models for Synthesizing High-Quality Implicit Neural Representations
Dogyun Park, Sihyeon Kim, Sojin Lee, Hyunwoo J. Kim†.
This repository is an official implementation of the ICLR 2024 paper DDMI (Domain-Agnostic Latent Diffusion Models for Synthesizing High-Quality Implicit Neural Representations).
We propose a latent diffusion model that generates hierarchically decomposed positional embeddings of Implicit neural representations, enabling high-quality generation on various data domains.
To install requirements, run:
git clone https://github.com/mlvlab/DDMI.git
cd DDMI
conda create -n ddmi python==3.8
conda activate ddmi
conda install pytorch==2.2.0 torchvision==0.17.0 torchaudio==2.2.0 pytorch-cuda=11.8 -c pytorch -c nvidia
pip install accelerate omegaconf einops pyspng natsort av ema-pytorch timm ninja gdown scipy
(RECOMMENDED, linux) Install PyTorch 2.2.0 with CUDA 11.8 for xformers, recommended for memory-efficient computation. Also, install pytorch compatible torch-scatter version for 3D.
We have utilized two datasets for 2D image experiments: AFHQ-V2 and CelebA-HQ. We have used dog
and cat
categories in AFHQ-V2 dataset. You may change the location of the dataset by changing data_dir
of config files in configs/
, and specify test_data_dir
to measure r-FID during training. Each dataset should be structured as below:
Data
|-- folder
|-- image1.png
|-- image2.png
|-- ...
We have used dataloader from PVDM and SkyTimelapse dataset. You may change the location of the dataset by changing data_dir
of config files in configs/
, and specify test_data_dir
to measure r-FVD during training. Dataset should be structured as below:
Data
|-- train
|-- video1
|-- frame00000.png
|-- frame00001.png
|-- ...
|-- video2
|-- frame00000.png
|-- frame00001.png
|-- ...
|-- ...
|-- val
|-- video1
|-- frame00000.png
|-- frame00001.png
|-- ...
|-- ...
We have used ShapeNet dataset v1 and dataloader following Occupancy Networks. You may change the location of the dataset by changing data_dir
of config files in configs/
.
We have used srn-cars dataset following pixel-NeRF or you may download the dataset from here. You may change the location of the dataset by changing data_dir
of config files in configs/
. Dataset should be structured as below:
Data
|-- cars
|-- sampled
|-- car00000.npz
|-- car00001.npz
|-- ...
To train other signal domains, you may change the domain
of config files in configs/
, e.g., image
, occupancy
, nerf
, or video
. Currently, different network is trained for different signal domain. By default, the model's checkpoint will be stored in ./results
. If training D2C-VAE in the first stage is unstable, i.e., NAN value, try increasing sn_reg_weight_decay
or sn_reg_weight_decay_init
of config files to increase the weight of spectral regularization. To resume the training from previous checkpoint enable resume
to True
.
D2C-VAE aims to learn the latent space that generates PEs between discrete data and continuous function, i.e., point clouds to occupancy function, pixel image to continuous RGB image.
CUDA_VISIBLE_DEVICES=0,1,2,3 accelerate launch --multi_gpu --num_processes=4 main.py --exp d2c-vae --configs configs/d2c-vae/img.yaml
After training D2C-VAE, we learn the latent diffusion model on the latent space of D2C-VAE. Since latent variable is represented as a set of 2D planes, we use 2D convolution UNet model for LDM across different modalities.
CUDA_VISIBLE_DEVICES=0,1,2,3 accelerate launch --multi_gpu --num_processes=4 main.py --exp ldm --configs configs/ldm/img.yaml
In our paper, we have utilized several evaluation metrics for assessing generation quality: FID for image, MMD and COV for 3D shape, and FVD for video evaluation.
You can change the total number of sampling steps (NFE) by changing the sampling_timesteps
in the config file.
To evaluate FID of the trained 2D image model, run the following script by changing the mode
of config files to eval
from train
:
python main.py --exp ldm --configs configs/ldm/img.yaml
To evaluate FVD of the trained video model, run the following script by changing the mode
of config files to eval
from train
:
python main.py --exp ldm --configs configs/ldm/video.yaml
You first need to generate an occupancy function and process it to make point clouds.
First, run the following script by changing the mode
of config files to eval
from train
. The generated 3D shapes will be saved in the eval
folder, located in the directory specified in config save_pth
.
python main.py --exp ldm --configs configs/ldm/occupancy.yaml
Then, run the following script to sample 2048 point clouds from the mesh.
python eval_3d/meshtopc.py --pth [location of mesh files] --save_pth [save location of point clouds]
Finally, run the following script to measure MMD and COV between ground truth point clouds and generated point clouds.
python eval_3d/compute_metrics_3d.py --gt_pth [location of ground truth point clouds] --save_pth [location of generated point clouds]
You can generate a signal from the pre-trained model in ./results
by changing the mode
of config files to gen
from train
, then run:
python main.py --exp ldm --configs configs/ldm/img.yaml
For arbitrary-resolution 2D image generation with consistent content, you only have to change test_resolution
of config files with a fixed seed.
Checkpoints for the pre-trained models can be downloaded from here. Download the checkpoint in ./results
folder and change the pretrained
of config file to True
for evaluation.
This repo is built upon ADM, latent-diffusion, and PVDM.
@inproceedings{park2024ddmi,
title={DDMI: Domain-Agnostic Latent Diffusion Models for Synthesizing High-Quality Implicit Neural Representations},
author={Park, Dogyun and Kim, Sihyeon and Lee, Sojin and Kim, Hyunwoo J},
booktitle={The Twelfth International Conference on Learning Representations},
year={2024}
}