add and test on the evaluation of Refcoco+

jiasenlu · jiasenlu · commit 2cd6ca58562d · 2019-08-22T01:20:42.000-04:00
diff --git a/README.md b/README.md
@@ -3,6 +3,10 @@
 Code and pre-trained models for **ViLBERT: Pretraining Task-Agnostic VisiolinguisticRepresentations for Vision-and-Language Tasks**.
 
 
+
+*Note: This is beta release which * 
+
+
 ## Repository Setup
 
 1. Create a fresh conda environment, and install all dependencies.
@@ -42,32 +46,85 @@ Check `README.md` under `data` for more details.
 |ViLBERT 6-Layer| RefCOCO+ |[Link]()|
 |ViLBERT 6-Layer| Image Retrieval |[Link]()|
 
+### Zero-Shot Image Retrieval
 
-## Visiolinguistic Pre-training
+We can directly use the Pre-trained ViLBERT model for zero-shot image retrieval tasks on Flickr30k. 
 
-Once you extracted all the image features, to train the model: 
+1: Download the pretrained model with objective `Conceptual Caption` and put it under `save`
 
+2: Update `featyres_h5path1` and `val_annotations_jsonpath` in  `vlbert_task.yml` to load the Flickr30k testset image feature and jsonfile (defualt is training feature). 
+
+3: Use the following command to evaluate pre-trained 6 layer ViLBERT model. (only support single GPU for evaluation now):
+
+```bash
+python eval_retrieval.py --bert_model bert-base-uncased --from_pretrained save/bert_base_6_layer_6_connect/pytorch_model_9.bin --config_file config/bert_base_6layer_6conect.json --task 3 --split test --batch_size 1 --zero_shot
 ```
 
+### Image Retrieval
+
+1: Download the pretrained model with objective `Image Retrieval` and put it under `save`
+
+2: Update `featyres_h5path1` and `val_annotations_jsonpath` in  `vlbert_task.yml` to load the Flickr30k testset image feature and jsonfile (defualt is training feature). 
+
+3: Use the following command to evaluate pre-trained 6 layer ViLBERT model. (only support single GPU for evaluation now):
+
+```bash
+python eval_retrieval.py --bert_model bert-base-uncased --from_pretrained save/RetrievalFlickr30k_bert_base_6layer_6conect-pretrained/pytorch_model_19.bin --config_file config/bert_base_6layer_6conect.json --task 3 --split test --batch_size 1
 ```
 
-train the model in a distributed setting:
+### VQA
+
+1: Download the pretrained model with objective `VQA` and put it under `save`
+
+2: To test on held out validation split, use the following command: 
+
 ```
 
 ```
 
-### Zero-Shot Image Retrieval
+### VCR
 
-We can directly use the Pre-trained ViLBERT model for zero-shot image retrieval tasks on Flickr30k. 
+1: Download the pretrained model with objective `VCR` and put it under `save`
+
+2: To test on VCR Q->A
+
+```
+
+```
+
+3: To test on VCR QA->R
 
-First, update `featyres_h5path1` and `val_annotations_jsonpath` in  `vlbert_task.yml` to load the Flickr30k testset image feature and jsonfile (defualt is training feature). 
+```
+
+```
 
-Then, use the following command to evaluate pre-trained 6 layer ViLBERT model. (only support single GPU for evaluation now):
+### RefCOCO+
+
+1: Download the pretrained model with objective `RefCOCO+` and put it under `save`
+
+2: We use the Pre-computed detections/masks from [MAttNet](https://github.com/lichengunc/MAttNet) for fully-automatic comprehension task, Check the MAttNet repository for more details. 
+
+3: To test on the RefCOCO+ val set and use the following command:
 
 ```bash
-python eval_retrieval.py --bert_model bert-base-uncased --from_pretrained save/bert_base_6_layer_6_connect/pytorch_model_9.bin --config_file config/bert_base_6layer_6conect.json --task 3 --split test --batch_size 1 --zero_shot
+python eval_tasks.py --bert_model bert-base-uncased --from_pretrained save/refcoco+_bert_base_6layer_6conect-pretrained/pytorch_model_19.bin --config_file config/bert_base_6layer_6conect.json --task 4
+```
+
+## Visiolinguistic Pre-training
+
+Once you extracted all the image features, to train the model: 
+
 ```
 
+```
+
+train the model in a distributed setting:
+
+```
+
+```
+
+
 
 ## TASKS
 
diff --git a/eval_tasks.py b/eval_tasks.py
@@ -19,7 +19,6 @@
 import torch.nn as nn
 
 from pytorch_pretrained_bert.optimization import BertAdam, WarmupLinearSchedule
-
 from vilbert.task_utils import LoadDatasetEval, LoadLosses, ForwardModelsTrain, ForwardModelsVal, EvaluatingModel
 
 import vilbert.utils as utils
diff --git a/vilbert/datasets/refer_expression_dataset.py b/vilbert/datasets/refer_expression_dataset.py
@@ -79,6 +79,9 @@ def __init__(
 
         self.max_region_num = max_region_num
 
+        if not os.path.exists(os.path.join(dataroot, "cache")):
+            os.makedirs(os.path.join(dataroot, "cache"))
+
         cache_path = os.path.join(dataroot, "cache", task + '_' + split + '_' + str(max_seq_length)+ "_" + str(max_region_num) + '.pkl')
         if not os.path.exists(cache_path):
             self.tokenize()
diff --git a/vilbert/datasets/vqa_dataset.py b/vilbert/datasets/vqa_dataset.py
@@ -100,8 +100,8 @@ def __init__(
     ):
         super().__init__()
         self.split = split
-        ans2label_path = os.path.join('data', task, "cache", "trainval_ans2label.pkl")
-        label2ans_path = os.path.join('data', task, "cache", "trainval_label2ans.pkl")
+        ans2label_path = os.path.join(dataroot, "cache", "trainval_ans2label.pkl")
+        label2ans_path = os.path.join(dataroot, "cache", "trainval_label2ans.pkl")
         self.ans2label = cPickle.load(open(ans2label_path, "rb"))
         self.label2ans = cPickle.load(open(label2ans_path, "rb"))
         self.num_labels = len(self.ans2label)
@@ -110,7 +110,11 @@ def __init__(
         self._image_features_reader = image_features_reader
         self._tokenizer = tokenizer
         self._padding_index = padding_index
-        cache_path = os.path.join('data', task, "cache", task + '_' + split + '_' + str(max_seq_length)+'.pkl')
+        
+        if not os.path.exists(os.path.join(dataroot, "cache")):
+            os.makedirs(os.path.join(dataroot, "cache"))
+
+        cache_path = os.path.join(dataroot, "cache", task + '_' + split + '_' + str(max_seq_length)+'.pkl')
         if not os.path.exists(cache_path):
             self.entries = _load_dataset(dataroot, split)
             self.tokenize(max_seq_length)