You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: open_flamingo/eval/README.md
+9-1Lines changed: 9 additions & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -1,5 +1,4 @@
1
1
# OpenFlamingo Evaluation Suite
2
-
3
2
This is the evaluation module of OpenFlamingo. It contains a set of utilities for evaluating multimodal models on various benchmarking datasets.
4
3
5
4
*This module is a work in progress! We will be updating this README as it develops. In the meantime, if you notice an issue, please file a Bug Report or Feature Request [here](https://github.com/mlfoundations/open_flamingo/issues/new/choose).*
@@ -19,6 +18,15 @@ This is the evaluation module of OpenFlamingo. It contains a set of utilities fo
19
18
20
19
When evaluating a model using `num_shots` shots, we sample the exemplars from the training split. Performance is evaluated on a disjoint test split, subsampled to `--num_samples` examples (or using the full test split if `--num_samples=-1`).
21
20
21
+
## Supported models
22
+
This evaluation module interfaces with models using the `EvalModel` class defined in `eval/eval_models/eval_model.py`. The `EvalModel` wrapper standardizes the generation and rank classification interfaces.
23
+
24
+
To help standardize VLM evaluations, we have implemented EvalModel wrappers for models from three code repositories:
25
+
26
+
* This open_flamingo repository, i.e. all models created using this repository's `src` code
27
+
* The pretrained [BLIP-2](https://huggingface.co/docs/transformers/main/model_doc/blip-2) models. Note that this model can only take in one image per input sequence; this is not to be confused with the BLIP-like implementation in the open_flamingo repository, which can take in arbitrarily interleaved image/text sequences
Our codebase uses DistributedDataParallel to parallelize evaluation by default, so please make sure to set the `MASTER_ADDR` and `MASTER_PORT` environment variables or use `torchrun`. We provide a sample Slurm evaluation script in `open_flamingo/open_flamingo/scripts/run_eval.sh`.
0 commit comments