This is a PyTorch implementation of the NeurIPS-22 paper: https://arxiv.org/abs/2206.12403
Arjun Majumdar*, Gunjan Aggarwal*, Bhavika Devnani, Judy Hoffman and Dhruv Batra
Georgia Institute of Technology, Meta AI
We present a scalable approach for learning open-world object-goal navigation (ObjectNav) – the task of asking a virtual robot (agent) to find any instance of an object in an unexplored environment (e.g., “find a sink”). Our approach is entirely zero-shot – i.e., it does not require ObjectNav rewards or demonstrations of any kind.
Model Architecture for ZSON.
All the required data can be downloaded from here.
-
Create a conda environment:
conda create -n zson python=3.7 cmake=3.14.0
conda activate zson
-
Install pytorch version
1.10.2
:conda install pytorch==1.10.2 torchvision==0.11.3 cudatoolkit=11.3 -c pytorch -c conda-forge
-
Install habitat-sim:
conda install habitat-sim-challenge-2022 headless -c conda-forge -c aihabitat
-
Install habitat-lab:
git clone --branch challenge-2022 https://github.com/facebookresearch/habitat-lab.git habitat-lab-challenge-2022
cd habitat-lab-challenge-2022
pip install -r requirements.txt
python setup.py develop --all # install habitat and habitat_baselines
cd ..
-
Setup steps
git clone [email protected]:gunagg/zson.git
cd zson
pip install -r requirements.txt
python setup.py develop
-
Follow the instructions here to set up the
data/scene_datasets/
directory.gibson
scenes can be found here. -
Download the HM3D ImageNav training dataset:
wget https://huggingface.co/gunjan050/ZSON/resolve/main/imagenav_hm3d.zip
unzip imagenav_hm3d.zip
rm imagenav_hm3d.zip # clean-up
-
Download the MP3D objectnav dataset.
wget https://dl.fbaipublicfiles.com/habitat/data/datasets/objectnav/m3d/v1/objectnav_mp3d_v1.zip
mkdir -p data/datasets/objectnav/mp3d/v1
unzip objectnav_mp3d_v1.zip -d data/datasets/objectnav/mp3d/v1
rm objectnav_mp3d_v1.zip # clean-up
-
Download the HM3D objectnav dataset.
wget https://dl.fbaipublicfiles.com/habitat/data/datasets/objectnav/hm3d/v1/objectnav_hm3d_v1.zip
unzip objectnav_hm3d_v1.zip -d data/datasets/objectnav/
rm objectnav_hm3d_v1.zip # clean-up
-
Download the trained checkpoints zson_conf_A.pth and zson_conf_B.pth, and move to
data/checkpoints
. -
To train policies using OVRL pretrained RGB encoder, download the model weights from here and move to
data/models/
. More details on the encoder can be found here. -
Setup
data/goal_datasets
using the scripttools/extract-goal-features.py
. This caches CLIP goal embeddings for faster training.Your directory structure should now look like this:
. +-- habitat-lab-v0.2.1/ | ... +-- zson/ | +-- data/ | | +-- datasets/ | | | +-- objectnav/ | | | +-- imagenav/ | | +-- scene_datasets/ | | | +-- hm3d/ | | | +-- mp3d/ | | +-- goal_datasets/ | | | +-- imagenav/ | | | | +-- hm3d/ | | +-- models/ | | +-- checkpoints/ | +-- zson/ | ...
sbatch scripts/imagenav-v1-hm3d-ovrl-rn50.sh
sbatch scripts/imagenav-v2-hm3d-ovrl-rn50.sh
To evaluate a checkpoint trained using ZSON checkpoint use the following command:
sbatch scripts/objnav-eval-$DESIRED-CONFIGURATION$-$DATASET$.sh
If you use this code in your research, please consider citing:
@inproceedings{majumdar2022zson,
title={ZSON: Zero-Shot Object-Goal Navigation using Multimodal Goal Embeddings},
author={Majumdar, Arjun and Aggarwal, Gunjan and Devnani, Bhavika and Hoffman, Judy and Batra, Dhruv},
booktitle={Neural Information Processing Systems (NeurIPS)},
year={2022}
}