Training-Free Open-Vocabulary Segmentation with Offline Diffusion-Augmented Prototype Generation (CVPR 2024)
Our setup is based on pytorch 1.13.1, mmcv 1.6.2 and mmsegmentation 0.27.0. To create the same environment that we used for our experiments:
python3 -m venv ./freeda
source ./freeda/bin/activate
pip install -U pip setuptools wheel
Install PyTorch:
pip install torch==1.13.1+cu117 torchvision==0.14.1+cu117 torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cu117
Install other dependencies:
pip install -r requirements.txt
Download both the prototype embeddings and the faiss index, and decompress them
into ./data
:
cd ./data
mkdir "prototype_embeddings"
tar -xvzf prototype_embeddings.tar -C ./prototype_embeddings
unzip faiss_index.zip
This section is adapted from TCL and GroupViT README.
The overall file structure is as follows:
src
├── data
│ ├── cityscapes
│ │ ├── leftImg8bit
│ │ │ ├── train
│ │ │ ├── val
│ │ ├── gtFine
│ │ │ ├── train
│ │ │ ├── val
│ ├── VOCdevkit
│ │ ├── VOC2012
│ │ │ ├── JPEGImages
│ │ │ ├── SegmentationClass
│ │ │ ├── ImageSets
│ │ │ │ ├── Segmentation
│ │ ├── VOC2010
│ │ │ ├── JPEGImages
│ │ │ ├── SegmentationClassContext
│ │ │ ├── ImageSets
│ │ │ │ ├── SegmentationContext
│ │ │ │ │ ├── train.txt
│ │ │ │ │ ├── val.txt
│ │ │ ├── trainval_merged.json
│ │ ├── VOCaug
│ │ │ ├── dataset
│ │ │ │ ├── cls
│ ├── ade
│ │ ├── ADEChallengeData2016
│ │ │ ├── annotations
│ │ │ │ ├── training
│ │ │ │ ├── validation
│ │ │ ├── images
│ │ │ │ ├── training
│ │ │ │ ├── validation
│ ├── coco_stuff164k
│ │ ├── images
│ │ │ ├── train2017
│ │ │ ├── val2017
│ │ ├── annotations
│ │ │ ├── train2017
│ │ │ ├── val2017
Please download and setup PASCAL VOC , PASCAL Context, COCO-Stuff164k , Cityscapes, and ADE20k datasets following MMSegmentation data preparation document.
Pascal VOC:
python -m torch.distributed.run main.py --eval --eval_cfg configs/pascal20/freeda_pascal20.yml --eval_base_cfg configs/pascal20/eval_pascal20.yml
Pascal Context:
python -m torch.distributed.run main.py --eval --eval_cfg configs/pascal59/freeda_pascal59.yml --eval_base_cfg configs/pascal59/eval_pascal59.yml
COCO-Stuff:
python -m torch.distributed.run main.py --eval --eval_cfg configs/cocostuff/freeda_cocostuff.yml --eval_base_cfg configs/cocostuff/eval_cocostuff.yml
Cityscapes:
python -m torch.distributed.run main.py --eval --eval_cfg configs/cityscapes/freeda_cityscapes.yml --eval_base_cfg configs/cityscapes/eval_cityscapes.yml
ADE20K:
python -m torch.distributed.run main.py --eval --eval_cfg configs/ade/freeda_ade.yml --eval_base_cfg configs/ade/eval_ade.yml
If you find FreeDA useful for your work please cite:
@inproceedings{barsellotti2024training
title={Training-Free Open-Vocabulary Segmentation with Offline Diffusion-Augmented Prototype Generation},
author={Barsellotti, Luca and Amoroso, Roberto and Cornia, Marcella and Baraldi, Lorenzo and Cucchiara, Rita},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
year={2024}
}