|
3 | 3 | Code and pre-trained models for **ViLBERT: Pretraining Task-Agnostic VisiolinguisticRepresentations for Vision-and-Language Tasks**.
|
4 | 4 |
|
5 | 5 |
|
| 6 | + |
| 7 | +*Note: This is beta release which * |
| 8 | + |
| 9 | + |
6 | 10 | ## Repository Setup
|
7 | 11 |
|
8 | 12 | 1. Create a fresh conda environment, and install all dependencies.
|
@@ -42,32 +46,85 @@ Check `README.md` under `data` for more details.
|
42 | 46 | |ViLBERT 6-Layer| RefCOCO+ |[Link]()|
|
43 | 47 | |ViLBERT 6-Layer| Image Retrieval |[Link]()|
|
44 | 48 |
|
| 49 | +### Zero-Shot Image Retrieval |
45 | 50 |
|
46 |
| -## Visiolinguistic Pre-training |
| 51 | +We can directly use the Pre-trained ViLBERT model for zero-shot image retrieval tasks on Flickr30k. |
47 | 52 |
|
48 |
| -Once you extracted all the image features, to train the model: |
| 53 | +1: Download the pretrained model with objective `Conceptual Caption` and put it under `save` |
49 | 54 |
|
| 55 | +2: Update `featyres_h5path1` and `val_annotations_jsonpath` in `vlbert_task.yml` to load the Flickr30k testset image feature and jsonfile (defualt is training feature). |
| 56 | + |
| 57 | +3: Use the following command to evaluate pre-trained 6 layer ViLBERT model. (only support single GPU for evaluation now): |
| 58 | + |
| 59 | +```bash |
| 60 | +python eval_retrieval.py --bert_model bert-base-uncased --from_pretrained save/bert_base_6_layer_6_connect/pytorch_model_9.bin --config_file config/bert_base_6layer_6conect.json --task 3 --split test --batch_size 1 --zero_shot |
50 | 61 | ```
|
51 | 62 |
|
| 63 | +### Image Retrieval |
| 64 | + |
| 65 | +1: Download the pretrained model with objective `Image Retrieval` and put it under `save` |
| 66 | + |
| 67 | +2: Update `featyres_h5path1` and `val_annotations_jsonpath` in `vlbert_task.yml` to load the Flickr30k testset image feature and jsonfile (defualt is training feature). |
| 68 | + |
| 69 | +3: Use the following command to evaluate pre-trained 6 layer ViLBERT model. (only support single GPU for evaluation now): |
| 70 | + |
| 71 | +```bash |
| 72 | +python eval_retrieval.py --bert_model bert-base-uncased --from_pretrained save/RetrievalFlickr30k_bert_base_6layer_6conect-pretrained/pytorch_model_19.bin --config_file config/bert_base_6layer_6conect.json --task 3 --split test --batch_size 1 |
52 | 73 | ```
|
53 | 74 |
|
54 |
| -train the model in a distributed setting: |
| 75 | +### VQA |
| 76 | + |
| 77 | +1: Download the pretrained model with objective `VQA` and put it under `save` |
| 78 | + |
| 79 | +2: To test on held out validation split, use the following command: |
| 80 | + |
55 | 81 | ```
|
56 | 82 |
|
57 | 83 | ```
|
58 | 84 |
|
59 |
| -### Zero-Shot Image Retrieval |
| 85 | +### VCR |
60 | 86 |
|
61 |
| -We can directly use the Pre-trained ViLBERT model for zero-shot image retrieval tasks on Flickr30k. |
| 87 | +1: Download the pretrained model with objective `VCR` and put it under `save` |
| 88 | + |
| 89 | +2: To test on VCR Q->A |
| 90 | + |
| 91 | +``` |
| 92 | +
|
| 93 | +``` |
| 94 | + |
| 95 | +3: To test on VCR QA->R |
62 | 96 |
|
63 |
| -First, update `featyres_h5path1` and `val_annotations_jsonpath` in `vlbert_task.yml` to load the Flickr30k testset image feature and jsonfile (defualt is training feature). |
| 97 | +``` |
| 98 | +
|
| 99 | +``` |
64 | 100 |
|
65 |
| -Then, use the following command to evaluate pre-trained 6 layer ViLBERT model. (only support single GPU for evaluation now): |
| 101 | +### RefCOCO+ |
| 102 | + |
| 103 | +1: Download the pretrained model with objective `RefCOCO+` and put it under `save` |
| 104 | + |
| 105 | +2: We use the Pre-computed detections/masks from [MAttNet](https://github.com/lichengunc/MAttNet) for fully-automatic comprehension task, Check the MAttNet repository for more details. |
| 106 | + |
| 107 | +3: To test on the RefCOCO+ val set and use the following command: |
66 | 108 |
|
67 | 109 | ```bash
|
68 |
| -python eval_retrieval.py --bert_model bert-base-uncased --from_pretrained save/bert_base_6_layer_6_connect/pytorch_model_9.bin --config_file config/bert_base_6layer_6conect.json --task 3 --split test --batch_size 1 --zero_shot |
| 110 | +python eval_tasks.py --bert_model bert-base-uncased --from_pretrained save/refcoco+_bert_base_6layer_6conect-pretrained/pytorch_model_19.bin --config_file config/bert_base_6layer_6conect.json --task 4 |
| 111 | +``` |
| 112 | + |
| 113 | +## Visiolinguistic Pre-training |
| 114 | + |
| 115 | +Once you extracted all the image features, to train the model: |
| 116 | + |
69 | 117 | ```
|
70 | 118 |
|
| 119 | +``` |
| 120 | + |
| 121 | +train the model in a distributed setting: |
| 122 | + |
| 123 | +``` |
| 124 | +
|
| 125 | +``` |
| 126 | + |
| 127 | + |
71 | 128 |
|
72 | 129 | ## TASKS
|
73 | 130 |
|
|
0 commit comments