This repository is the official implementation of IMProv introduced in the TMLR 2024 paper:
IMProv: Inpainting-based Multimodal Prompting for Computer Vision Tasks
Jiarui Xu, Yossi Gandelsman, Amir Bar, Jianwei Yang, Jianfeng Gao, Trevor Darrell, Xiaolong Wang
More in project page: https://jerryxu.net/IMProv/
- Jiarui Xu's Project Page (with additional visual results)
- HuggingFace 🤗 Model
- Run the demo on Google Colab:
- arXiv Page
If you find our work useful in your research, please cite:
@article{xu2023improv,
author = {Xu, Jiarui and Gandelsman, Yossi and Bar, Amir and Yang, Jianwei and Gao, Jianfeng and Darrell, Trevor and Wang, Xiaolong},
title = {{IMProv: Inpainting-based Multimodal Prompting for Computer Vision Tasks}},
journal = {arXiv preprint arXiv: 2312.01771},
year = {2023},
}
- Release inference code and demo.
- Release checkpoints.
- Release S2CV dataset.
- Release training codes.
Install dependencies by running:
conda install pytorch=2.0 torchvision pytorch-cuda=11.7 -c pytorch -c nvidia
git clone https://github.com/xvjiarui/IMProv.git
pip install -e IMProv
python demo/demo.py --output demo/output.png
The output is saved in demo/output.png
.