Skip to content

Commit

Permalink
add and test on the evaluation of Refcoco+
Browse files Browse the repository at this point in the history
  • Loading branch information
jiasenlu committed Aug 22, 2019
1 parent 06ccee0 commit 2cd6ca5
Show file tree
Hide file tree
Showing 4 changed files with 75 additions and 12 deletions.
73 changes: 65 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,10 @@
Code and pre-trained models for **ViLBERT: Pretraining Task-Agnostic VisiolinguisticRepresentations for Vision-and-Language Tasks**.



*Note: This is beta release which *


## Repository Setup

1. Create a fresh conda environment, and install all dependencies.
Expand Down Expand Up @@ -42,32 +46,85 @@ Check `README.md` under `data` for more details.
|ViLBERT 6-Layer| RefCOCO+ |[Link]()|
|ViLBERT 6-Layer| Image Retrieval |[Link]()|

### Zero-Shot Image Retrieval

## Visiolinguistic Pre-training
We can directly use the Pre-trained ViLBERT model for zero-shot image retrieval tasks on Flickr30k.

Once you extracted all the image features, to train the model:
1: Download the pretrained model with objective `Conceptual Caption` and put it under `save`

2: Update `featyres_h5path1` and `val_annotations_jsonpath` in `vlbert_task.yml` to load the Flickr30k testset image feature and jsonfile (defualt is training feature).

3: Use the following command to evaluate pre-trained 6 layer ViLBERT model. (only support single GPU for evaluation now):

```bash
python eval_retrieval.py --bert_model bert-base-uncased --from_pretrained save/bert_base_6_layer_6_connect/pytorch_model_9.bin --config_file config/bert_base_6layer_6conect.json --task 3 --split test --batch_size 1 --zero_shot
```

### Image Retrieval

1: Download the pretrained model with objective `Image Retrieval` and put it under `save`

2: Update `featyres_h5path1` and `val_annotations_jsonpath` in `vlbert_task.yml` to load the Flickr30k testset image feature and jsonfile (defualt is training feature).

3: Use the following command to evaluate pre-trained 6 layer ViLBERT model. (only support single GPU for evaluation now):

```bash
python eval_retrieval.py --bert_model bert-base-uncased --from_pretrained save/RetrievalFlickr30k_bert_base_6layer_6conect-pretrained/pytorch_model_19.bin --config_file config/bert_base_6layer_6conect.json --task 3 --split test --batch_size 1
```

train the model in a distributed setting:
### VQA

1: Download the pretrained model with objective `VQA` and put it under `save`

2: To test on held out validation split, use the following command:

```
```

### Zero-Shot Image Retrieval
### VCR

We can directly use the Pre-trained ViLBERT model for zero-shot image retrieval tasks on Flickr30k.
1: Download the pretrained model with objective `VCR` and put it under `save`

2: To test on VCR Q->A

```
```

3: To test on VCR QA->R

First, update `featyres_h5path1` and `val_annotations_jsonpath` in `vlbert_task.yml` to load the Flickr30k testset image feature and jsonfile (defualt is training feature).
```
```

Then, use the following command to evaluate pre-trained 6 layer ViLBERT model. (only support single GPU for evaluation now):
### RefCOCO+

1: Download the pretrained model with objective `RefCOCO+` and put it under `save`

2: We use the Pre-computed detections/masks from [MAttNet](https://github.com/lichengunc/MAttNet) for fully-automatic comprehension task, Check the MAttNet repository for more details.

3: To test on the RefCOCO+ val set and use the following command:

```bash
python eval_retrieval.py --bert_model bert-base-uncased --from_pretrained save/bert_base_6_layer_6_connect/pytorch_model_9.bin --config_file config/bert_base_6layer_6conect.json --task 3 --split test --batch_size 1 --zero_shot
python eval_tasks.py --bert_model bert-base-uncased --from_pretrained save/refcoco+_bert_base_6layer_6conect-pretrained/pytorch_model_19.bin --config_file config/bert_base_6layer_6conect.json --task 4
```

## Visiolinguistic Pre-training

Once you extracted all the image features, to train the model:

```
```

train the model in a distributed setting:

```
```



## TASKS

Expand Down
1 change: 0 additions & 1 deletion eval_tasks.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,6 @@
import torch.nn as nn

from pytorch_pretrained_bert.optimization import BertAdam, WarmupLinearSchedule

from vilbert.task_utils import LoadDatasetEval, LoadLosses, ForwardModelsTrain, ForwardModelsVal, EvaluatingModel

import vilbert.utils as utils
Expand Down
3 changes: 3 additions & 0 deletions vilbert/datasets/refer_expression_dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -79,6 +79,9 @@ def __init__(

self.max_region_num = max_region_num

if not os.path.exists(os.path.join(dataroot, "cache")):
os.makedirs(os.path.join(dataroot, "cache"))

cache_path = os.path.join(dataroot, "cache", task + '_' + split + '_' + str(max_seq_length)+ "_" + str(max_region_num) + '.pkl')
if not os.path.exists(cache_path):
self.tokenize()
Expand Down
10 changes: 7 additions & 3 deletions vilbert/datasets/vqa_dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -100,8 +100,8 @@ def __init__(
):
super().__init__()
self.split = split
ans2label_path = os.path.join('data', task, "cache", "trainval_ans2label.pkl")
label2ans_path = os.path.join('data', task, "cache", "trainval_label2ans.pkl")
ans2label_path = os.path.join(dataroot, "cache", "trainval_ans2label.pkl")
label2ans_path = os.path.join(dataroot, "cache", "trainval_label2ans.pkl")
self.ans2label = cPickle.load(open(ans2label_path, "rb"))
self.label2ans = cPickle.load(open(label2ans_path, "rb"))
self.num_labels = len(self.ans2label)
Expand All @@ -110,7 +110,11 @@ def __init__(
self._image_features_reader = image_features_reader
self._tokenizer = tokenizer
self._padding_index = padding_index
cache_path = os.path.join('data', task, "cache", task + '_' + split + '_' + str(max_seq_length)+'.pkl')

if not os.path.exists(os.path.join(dataroot, "cache")):
os.makedirs(os.path.join(dataroot, "cache"))

cache_path = os.path.join(dataroot, "cache", task + '_' + split + '_' + str(max_seq_length)+'.pkl')
if not os.path.exists(cache_path):
self.entries = _load_dataset(dataroot, split)
self.tokenize(max_seq_length)
Expand Down

0 comments on commit 2cd6ca5

Please sign in to comment.