This repository is about DATA302 (Introduction to computer vision its application, 기초컴퓨터비전이론및응용) Project based on GraphVQA paper
Combining scene graph generation with visual question answering has been actively studied and developed recently. There is a problem that the need for exploring many parts of scene graph reasoning limits more flexible reasoning. In this paper, we propose an approach to leverage a specific object in the question that corresponds image objects. We utilize this procedure by guiding attention and combining attention prior with attention score. And we propose a model that conducts this process, Guided Attention Network in Graphs (GANG). We show that our approach outperforms the baseline model (GraphVQA) and the performance of GANG is boosted by GAT-like encoding.
Since our model is tested with limited resources, that is, under the same conditions the results are recorded and compared with the two models. Of the entire GQA dataset, only 32000 questions were randomly extracted and trained. Random seeds were fixed to prevent randomness from affecting the final performance. All experiments were conducted based on 30 epochs. Our final model outperforms the baseline model and interim model (DGA) by a large margin in the Accuracy metric!
This repo provides the source code of our paper: GraphVQA: Language-Guided Graph Neural Networks for Scene Graph Question Answering (NAACL 2021 MAI Workshop) [PDF].
@inproceedings{2021graphvqa,
author = {Weixin Liang and
Yanhao Jiang and
Zixuan Liu},
title = {{GraghVQA}: Language-Guided Graph Neural Networks for Graph-based Visual
Question Answering},
booktitle = "Proceedings of the Third Workshop on Multimodal Artificial Intelligence",
month = jun,
year = "2021",
address = "Mexico City, Mexico",
publisher = "Association for Computational Linguistics",
url = "https://www.aclweb.org/anthology/2021.maiworkshop-1.12",
doi = "10.18653/v1/2021.maiworkshop-1.12",
pages = "79--86"
}
LRTA: A Transparent Neural-Symbolic Reasoning Framework with Modular Supervision for Visual Question Answering (NeurIPS KR2ML 2020). Weixin Liang, Feiyang Niu, Aishwarya Reganti, Govind Thattai and Gokhan Tur. [PDF] [Lightning Talk] [Blog] [Github] [Poster] [NeurIPS KR2ML 2020]
Images are more than a collection of objects or attributes --- they represent a web of relationships among interconnected objects. Scene Graph has emerged as a new modality for a structured graphical representation of images. Scene Graph encodes objects as nodes connected via pairwise relations as edges. To support question answering on scene graphs, we propose GraphVQA, a language-guided graph neural network framework that translates and executes a natural language question as multiple iterations of message passing among graph nodes. We explore the design space of GraphVQA framework, and discuss the trade-off of different design choices. Our experiments on GQA dataset show that GraphVQA outperforms the state-of-the-art model by a large margin (88.43% vs. 94.78%). Our code is available at https://github.com/codexxxl/GraphVQA
Create a conda
environment with python
version = 3.6
Run following commands in the created conda
environment
(Note: torchtext requires version: torchtext<0.9.0)
conda install -c pytorch torchtext
conda install -c conda-forge spacy
conda install -c conda-forge cupy
python -m spacy download en_core_web_sm
conda install -c anaconda nltk
Excute python
and run following:
import nltk
nltk.download('wordnet')
Follow the link below to install PyTorch Geometric via binaries: https://pytorch-geometric.readthedocs.io/en/latest/notes/installation.html#installation-via-binaries
Example installation commands with PyTorch 1.4.0
and CUDA 10.0
: (Note you need to replace torch-1.4.0+cu100
field with your own installed PyTorch
and CUDA
versions.)
pip install --no-index torch-scatter -f https://pytorch-geometric.com/whl/torch-1.4.0+cu100.html
pip install --no-index torch-sparse -f https://pytorch-geometric.com/whl/torch-1.4.0+cu100.html
pip install --no-index torch-cluster -f https://pytorch-geometric.com/whl/torch-1.4.0+cu100.html
pip install --no-index torch-spline-conv -f https://pytorch-geometric.com/whl/torch-1.4.0+cu100.html
pip install torch-geometric
Download scene graphs raw data from:
https://nlp.stanford.edu/data/gqa/sceneGraphs.zip
Download questions raw data from:
https://nlp.stanford.edu/data/gqa/questions1.2.zip
Put sceneGraph json files: train_sceneGraphs.json
, val_sceneGraphs.json
into sceneGraphs/
Put questions json files: train_balanced_questions.json
, val_balanced_questions.json
, test_balanced_questions.json
, testdev_balanced_questions.json
into questions/original/
After this step, the data file structure should look like this:
GraphVQA
questions/
original/
train_balanced_questions.json
val_balanced_questions.json
test_balanced_questions.json
testdev_balanced_questions.json
sceneGraphs/
train_sceneGraphs.json
val_sceneGraphs.json
Replace line 13
in Constants.py
with your own root directory that contains this source code folder:
ROOT_DIR = pathlib.Path('/Users/yanhaojiang/Desktop/cs224w_final/')
Note ROOT_DIR
does not contain the repo name GraphVQA
. E.g. for the ROOT_DIR
above, my source code folder would be /Users/yanhaojiang/Desktop/cs224w_final/GraphVQA
.
Run command:
python preprocess.py
Following commands should run without error:
python pipeline_model_gat.py
python gqa_dataset_entry.py
Single GPU training:
CUDA_VISIBLE_DEVICES=0 python mainExplain_gat.py --log-name debug.log --batch-size=200 --lr_drop=90
Distributed training:
CUDA_VISIBLE_DEVICES=0,1,2,3 python -m torch.distributed.launch --nproc_per_node=4 --use_env mainExplain_gat.py --workers=4 --batch-size=200 --lr_drop=90
To kill a distributed training:
kill $(ps aux | grep mainExplain_gat.py | grep -v grep | awk '{print $2}')
Baseline and other test models are trained in similar ways with corresponding mainExplain_{lcgn, gcn, gine}.py
file excuted. Their related files are appended in \baseline_and_test_models
. (Note move them out of this folder to train).
Corresponding to GraphVQA-GAT's model and training files: gat_skip.py
, pipeline_model_gat.py
, and mainExplain_gat.py
, those model files are:
- Baseline LCGN:
lcgn.py
,pipeline_model_lcgn.py
,mainExplain_lcgn.py
- GraphVQA-GCN:
pipeline_model_gcn.py
,mainExplain_gcn.py
- GraphVQA-GINE:
pipeline_model_gine.py
,mainExplain_gine.py
We re-organize the evaluation script provided by GQA official, the original script and evaluation data can be found at https://cs.stanford.edu/people/dorarad/gqa/evaluate.html Step 1: Generate evaluation dataset To evaluate your model, there are two options:
- Use validation_balanced set of programs.
- Use validation_all set provided by GQA official.
First download evaluation data from: https://nlp.stanford.edu/data/gqa/eval.zip.
then unzip the file and move val_all_question.json to expainableGQA/questions/original/
now we will have
GraphVQA
questions/
original/
val_all_questions.json
Option 1: Since after running Step 3(preprocess.py), we already have
GraphVQA
questions/
val_balanced_programs.json
then, run commands
CUDA_VISIBLE_DEVICES=0,1,2,3 python -m torch.distributed.launch --nproc_per_node=4 --use_env mainExplain_gat.py --workers=4 --batch-size=4000 --evaluate --resume=outputdir/your_checkpoint.pth --evaluate_sets='val_balanced --output_dir='./your_outputdir/' --evaluate_sets='val_unbiased'
you should get results json file located in './your_outputdir/dump_result.json'
then, run python eval.py --predictions=./your_outputdir/dump_results.json --consistency
Option 2: If you want to use validation_all set, then, run commands
python preprocess.py --val-all=True
we should get
GraphVQA
questions/
val_all_programs.json
then, run commands
CUDA_VISIBLE_DEVICES=0,1,2,3 python -m torch.distributed.launch --nproc_per_node=4 --use_env mainExplain_gat.py --workers=4 --batch-size=4000 --evaluate --resume=outputdir/your_checkpoint.pth --evaluate_sets='val_balanced --output_dir='./your_outputdir/' --evaluate_sets='val_all'
you should get results json file located in './your_outputdir/dump_results.json'
then, run
python eval.py --predictions=./your_outputdir/dump_results.json --consistency