-
Notifications
You must be signed in to change notification settings - Fork 4
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #6 from boomb0om/dev
Installation fix
- Loading branch information
Showing
2 changed files
with
53 additions
and
14 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -3,7 +3,7 @@ | |
This project aims to unify the evaluation of generative text-to-image models and provide the ability to quickly and easily calculate most popular metrics. | ||
|
||
Goals of this benchmark: | ||
- **Unified** metrics and datasets for all models | ||
- **Unified** metrics and datasets for all text-to-image models | ||
- **Reproducible** results | ||
- **User-friendly** interface for most popular metrics: FID and CLIP-score | ||
|
||
|
@@ -17,6 +17,7 @@ Goals of this benchmark: | |
- [Examples](#examples) | ||
- [Documentation](#documentation) | ||
- [Contribution](#contribution) | ||
- [TO-DO](#to-do) | ||
- [Contacts](#contacts) | ||
- [Citing](#citing) | ||
- [Acknowledgments](#acknowledgments) | ||
|
@@ -25,8 +26,8 @@ Goals of this benchmark: | |
|
||
Generative text-to-image models have become a popular and widely used tool for users. | ||
There are many articles on the topic of image generation from text that present new, more advanced models. | ||
However, there is still no uniform way to measure the quality of such models. | ||
To address this issue, we provide an implementation of metrics to compare the quality of generative models. | ||
**However, there is still no uniform way to measure the quality of such models**. | ||
To address this issue, we provide an implementation of metrics and a dataset to compare the quality of generative models. | ||
|
||
We propose to use the metric MS-COCO FID-30K with OpenAI's CLIP score, which has already become a standard for measuring the quality of text2image models. | ||
We provide the MS-COCO validation subset and precalculated metrics for it. | ||
|
@@ -38,18 +39,19 @@ You can easily contribute your model into benchmark and make FID results reprodu | |
|
||
- Standardized FID calculation: fixed image preprocessing and InceptionV3 model. | ||
- FID-30k on MS-COCO validation set: we provide dataset on [huggingface🤗](https://huggingface.co/datasets/stasstaf/MS-COCO-validation), [precomputed FID stats](https://github.com/boomb0om/text2image-benchmark/releases/download/v0.0.1/MS-COCO_val2014_fid_stats.npz), fixed [30000 captions from MS-COCO](https://github.com/boomb0om/text2image-benchmark/releases/download/v0.0.1/MS-COCO_val2014_30k_captions.csv) that should be used to generate images | ||
- Implementations of different popular text-to-image models to make metrics **reproducible** | ||
- CLIP-score calculation | ||
- User-friendly metrics calculation (checkout [Getting started](#getting-started)) | ||
|
||
## Installation | ||
|
||
```bash | ||
pip install git+https://github.com/openai/CLIP.git | ||
pip install git+https://github.com/boomb0om/text2image-benchmark | ||
``` | ||
|
||
## Getting started | ||
|
||
|
||
### Metrics: FID | ||
|
||
Calculate FID for two sets of images: | ||
|
@@ -80,15 +82,28 @@ pip install -r T2IBenchmark/models/kandinsky21/requirements.txt | |
``` | ||
|
||
```python | ||
from T2IBenchmark import calculate_fid | ||
from T2IBenchmark.datasets import get_coco_fid_stats | ||
from T2IBenchmark import calculate_coco_fid | ||
from T2IBenchmark.models.kandinsky21 import Kandinsky21Wrapper | ||
|
||
fid, _ = calculate_fid( | ||
'path/to/your/generations/', | ||
get_coco_fid_stats() | ||
fid, fid_data = calculate_coco_fid( | ||
Kandinsky21Wrapper, | ||
device='cuda:0', | ||
save_generations_dir='coco_generations/' | ||
) | ||
``` | ||
|
||
### Metrics: CLIP-score | ||
|
||
Example of calculating CLIP-score for a set of images and fixed prompt: | ||
|
||
```python | ||
from T2IBenchmark import calculate_clip_score | ||
from glob import glob | ||
|
||
cat_paths = glob('../assets/images/cats/*.jpg') | ||
captions_mapping = {path: "a cat" for path in cat_paths} | ||
clip_score = calculate_clip_score(cat_paths, captions_mapping=captions_mapping) | ||
``` | ||
|
||
## Project Structure | ||
|
||
|
@@ -98,25 +113,50 @@ fid, _ = calculate_fid( | |
- `feature_extractors/` - Implementation of different neural nets used to extract features from images | ||
- `metrics/` - Implementation of metrics | ||
- `utils/` - Some utils | ||
- `tests/` - Tests | ||
- `docs/` - Documentation | ||
- `examples/` - Usage examples | ||
- `experiments/` - Experiments | ||
- `examples/` - Benchmark usage examples | ||
- `experiments/` - Experiments with metrics | ||
- `assets/` - Assets | ||
|
||
## Examples | ||
|
||
Examples of use are listed below in recommended order for study: | ||
|
||
- [Basic FID usage](examples/FID_basic.ipynb) | ||
- [Advanced FID usage](examples/FID_advanced.ipynb) | ||
- [CLIP score](examples/CLIP_score_usage.ipynb) | ||
- [FID calculation on MS-COCO](examples/FID-30k_on_MS-COCO.ipynb) | ||
- [Using ModelWrapper to measure MS-COCO FID-30k](examples/ModelWrapper_FID-30k.ipynb) | ||
|
||
## Documentation | ||
|
||
|
||
- [FID.md](docs/FID.md) - Explanation of different parameters that affects FID calculation | ||
|
||
## Contribution | ||
|
||
If you want to contribute your model into this benchmark and publish metrics, follow these steps: | ||
|
||
1) Create a fork of this repository | ||
2) Create a wrapper for your model that inherits `T2IModelWrapper` class | ||
3) Generate images and calculate metrics using `calculate_coco_fid`. For more information see [this example](examples/ModelWrapper_FID-30k.ipynb) | ||
4) Create a pull request with your model | ||
5) Congrats! | ||
|
||
## TO-DO | ||
|
||
- [ ] Implementation of Inception Score (IS) and Kernel Inception Distance (KID) | ||
- [ ] FID-CLIPscore metric and plots | ||
- [ ] Implementation and FIDs for [Kandinsky 2.X](https://github.com/ai-forever/Kandinsky-2) models with the help of Sber AI | ||
- [ ] Implementation and FIDs for popular models from [diffusers](https://github.com/huggingface/diffusers): Stable Diffusion, IF | ||
|
||
## Contacts | ||
|
||
Authors: | ||
- Pavlov Igor, [github](https://github.com/boomb0om) | ||
- Artyom Ivanov, [github](https://github.com/UsefulTornado) | ||
- Stanislav Stafievskiy, [github](https://github.com/stasstaf) | ||
|
||
If you have any question, please email `[email protected]`. | ||
|
||
## Citing | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -8,5 +8,4 @@ pillow | |
datasets | ||
opencv-python | ||
ftfy | ||
regex | ||
git+https://github.com/openai/CLIP.git | ||
regex |