Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
GroundingDINO		GroundingDINO
config		config
datasets		datasets
figures		figures
models		models
paper		paper
scripts		scripts
tools		tools
util		util
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
engine.py		engine.py
install.sh		install.sh
main.py		main.py
requirements.txt		requirements.txt

Repository files navigation

Expanding Scene Graph Boundaries: Fully Open-vocabulary Scene Graph Generation via Visual-Concept Alignment and Retention

Official Implementation of
"Expanding Scene Graph Boundaries: Fully Open-vocabulary Scene Graph Generation via Visual-Concept Alignment and Retention"
🏆 Recognized as "Best Paper Candidate" at ECCV 2024 (Milan, Italy)

📰 News

2025.05: Release the dataset MegaSG introduced in Scene-Bench
2025.02: Add checkpoints for the TPAMI version
2024.10: Our paper has been recognized as "Best Paper Candidate" (Milan, Italy, ECCV 2024)

🛠️ Setup

For simplicity, you can directly run:

bash install.sh

which includes the following steps:

Install PyTorch 1.9.1 and other dependencies:

pip install torch==1.9.1+cu111 torchvision==0.10.1+cu111 torchaudio==0.9.1 -f https://download.pytorch.org/whl/torch_stable.html
pip install -r requirements.txt

(Adjust CUDA version if necessary.)

Install GroundingDINO and download pretrained weights:

cd GroundingDINO && python3 setup.py install
mkdir $PWD/GroundingDINO/weights/
wget https://github.com/IDEA-Research/GroundingDINO/releases/download/v0.1.0-alpha/groundingdino_swint_ogc.pth -O $PWD/GroundingDINO/weights/groundingdino_swint_ogc.pth
wget https://github.com/IDEA-Research/GroundingDINO/releases/download/v0.1.0-alpha2/groundingdino_swinb_cogcoor.pth -O $PWD/GroundingDINO/weights/groundingdino_swinb_cogcoor.pth

📚 Dataset

Supported datasets:

VG150
COCO

Prepare the dataset under data/ folder following the instruction.

📈 Closed-set SGG

Training

bash scripts/DINO_train_dist.sh vg ./config/GroundingDINO_SwinT_OGC_full.py ./data ./logs/ovsgtr_vg_swint_full ./GroundingDINO/weights/groundingdino_swint_ogc.pth

or using Swin-B:

bash scripts/DINO_train_dist.sh vg ./config/GroundingDINO_SwinB_full.py ./data ./logs/ovsgtr_vg_swinb_full ./GroundingDINO/weights/groundingdino_swinb_cogcoor.pth

Adjust CUDA_VISIBLE_DEVICES if needed. Effective batch size = batch size × number of GPUs.

Inference

bash scripts/DINO_eval.sh vg [config file] [data path] [output path] [checkpoint]

or

bash scripts/DINO_eval_dist.sh vg [config file] [data path] [output path] [checkpoint]

📥 Checkpoints (Closed-set SGG)

Backbone	R@20/50/100	Checkpoint	Config
Swin-T	26.97 / 35.82 / 41.38	link	config/GroundingDINO_SwinT_OGC_full.py
Swin-T (pretrained on MegaSG)	27.34 / 36.27 / 41.95	link	config/GroundingDINO_SwinT_OGC_full.py
Swin-B	27.75 / 36.44 / 42.35	link	config/GroundingDINO_SwinB_full.py
Swin-B (w/o freq bias & focal loss)	27.53 / 36.18 / 41.79	link	config/GroundingDINO_SwinB_full_open.py
Swin-B (pretrained on MegaSG)	28.61 / 37.58 / 43.41	link	config/GroundingDINO_SwinB_full_open.py

🚀 OvD-SGG (Open-vocabulary Detection SGG)

Set:

sg_ovd_mode = True

📥 Checkpoints (OvD-SGG)

Backbone	R@20/50/100 (Base+Novel)	R@20/50/100 (Novel)	Checkpoint	Config
Swin-T	12.34 / 18.14 / 23.20	6.90 / 12.06 / 16.49	link	config/GroundingDINO_SwinT_OGC_ovd.py
Swin-B	15.43 / 21.35 / 26.22	10.21 / 15.58 / 19.96	link	config/GroundingDINO_SwinB_ovd.py
Swin-T (pretrained on MegaSG)	14.33 / 20.91 / 25.98	10.52 / 17.30 / 22.90	link	config/GroundingDINO_SwinT_OGC_ovd.py
Swin-B (pretrained on MegaSG)	15.21 / 21.21 / 26.12	10.31 / 15.78 / 20.47	link	config/GroundingDINO_SwinB_ovd.py

🔥 OvR-SGG (Open-vocabulary Relation SGG)

Set:

sg_ovr_mode = True

📥 Checkpoints (OvR-SGG)

Backbone	R@20/50/100 (Base+Novel)	R@20/50/100 (Novel)	Checkpoint	Config	Pre-trained Checkpoint	Pre-trained Config
Swin-T	15.85 / 20.50 / 23.90	10.17 / 13.47 / 16.20	link	config/GroundingDINO_SwinT_OGC_ovr.py	~~link~~	config/GroundingDINO_SwinT_OGC_pretrain.py
Swin-B	17.63 / 22.90 / 26.68	12.09 / 16.37 / 19.73	link	config/GroundingDINO_SwinB_ovr.py	link	config/GroundingDINO_SwinB_pretrain.py
Swin-T (pretrained on MegaSG)	19.38 / 25.40 / 29.71	12.23 / 17.02 / 21.15	link	config/GroundingDINO_SwinT_OGC_ovr.py	~~link~~	config/GroundingDINO_SwinT_OGC_pretrain.py
Swin-B (pretrained on MegaSG)	21.09 / 27.92 / 32.74	16.59 / 22.86 / 27.73	link	config/GroundingDINO_SwinB_ovr.py	~~link~~	config/GroundingDINO_SwinB_pretrain.py

🌟 OvD+R-SGG (Joint Open-vocabulary SGG)

Set:

sg_ovd_mode = True
sg_ovr_mode = True

📥 Checkpoints (OvD+R-SGG)

Backbone	R@20/50/100 (Joint)	R@20/50/100 (Novel Object)	R@20/50/100 (Novel Relation)	Checkpoint	Config	Pre-trained Checkpoint	Pre-trained Config
Swin-T	10.02 / 13.50 / 16.37	10.56 / 14.32 / 17.48	7.09 / 9.19 / 11.18	link	config/GroundingDINO_SwinT_OGC_ovdr.py	~~link~~	config/GroundingDINO_SwinT_OGC_pretrain.py
Swin-B	12.37 / 17.14 / 21.03	12.63 / 17.58 / 21.70	10.56 / 14.62 / 18.22	link	config/GroundingDINO_SwinB_ovdr.py	link	config/GroundingDINO_SwinB_pretrain.py
Swin-T (pretrained on MegaSG)	10.67 / 15.15 / 18.82	8.22 / 12.49 / 16.29	9.62 / 13.68 / 17.19	link	config/GroundingDINO_SwinT_OGC_ovdr.py	~~link~~	config/GroundingDINO_SwinT_OGC_pretrain.py
Swin-B (pretrained on MegaSG)	12.54 / 17.84 / 21.95	10.29 / 15.66 / 19.84	12.21 / 17.15 / 21.05	link	config/GroundingDINO_SwinB_ovdr.py	~~link~~	config/GroundingDINO_SwinB_pretrain.py

🤝 Acknowledgement

We thank:

for their awesome open-source codes and models.

📖 Citation

If you find our work helpful, please cite:

@inproceedings{chen2024expanding,
  title={Expanding Scene Graph Boundaries: Fully Open-vocabulary Scene Graph Generation via Visual-Concept Alignment and Retention},
  author={Chen, Zuyao and Wu, Jinlin and Lei, Zhen and Zhang, Zhaoxiang and Chen, Changwen},
  booktitle={European Conference on Computer Vision (ECCV)},
  pages={108--124},
  year={2024}
}

✨ Enjoy Exploring Open-Vocabulary Scene Graph Generation!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Expanding Scene Graph Boundaries: Fully Open-vocabulary Scene Graph Generation via Visual-Concept Alignment and Retention

📰 News

🛠️ Setup

📚 Dataset

📈 Closed-set SGG

Training

Inference

📥 Checkpoints (Closed-set SGG)

🚀 OvD-SGG (Open-vocabulary Detection SGG)

📥 Checkpoints (OvD-SGG)

🔥 OvR-SGG (Open-vocabulary Relation SGG)

📥 Checkpoints (OvR-SGG)

🌟 OvD+R-SGG (Joint Open-vocabulary SGG)

📥 Checkpoints (OvD+R-SGG)

🤝 Acknowledgement

📖 Citation

✨ Enjoy Exploring Open-Vocabulary Scene Graph Generation!

About

Releases

Packages

Languages

License

gpt4vision/OvSGTR

Folders and files

Latest commit

History

Repository files navigation

Expanding Scene Graph Boundaries: Fully Open-vocabulary Scene Graph Generation via Visual-Concept Alignment and Retention

📰 News

🛠️ Setup

📚 Dataset

📈 Closed-set SGG

Training

Inference

📥 Checkpoints (Closed-set SGG)

🚀 OvD-SGG (Open-vocabulary Detection SGG)

📥 Checkpoints (OvD-SGG)

🔥 OvR-SGG (Open-vocabulary Relation SGG)

📥 Checkpoints (OvR-SGG)

🌟 OvD+R-SGG (Joint Open-vocabulary SGG)

📥 Checkpoints (OvD+R-SGG)

🤝 Acknowledgement

📖 Citation

✨ Enjoy Exploring Open-Vocabulary Scene Graph Generation!

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages