The following datasets can be loaded with the current codes after downloaded (see example scripts):
FR Dataset | Description | NR Dataset | Description |
---|---|---|---|
PIPAL | 2AFC | FLIVE(PaQ-2-PiQ) | Tech & Aesthetic |
BAPPS | 2AFC | SPAQ | Mobile |
PieAPP | 2AFC | AVA | Aesthetic |
KADID-10k | KonIQ-10k(++) | ||
LIVEM | LIVEChallenge | ||
LIVE | PIQ2023 | Portrait dataset | |
TID2013 | GFIQA | Face IQA Dataset | |
TID2008 | |||
CSIQ |
Please see more details at Awesome Image Quality Assessment
Here are some other resources to download the dataset:
We create general interfaces for FR and NR datasets in pyiqa/data/general_fr_dataset.py
and pyiqa/data/general_nr_dataset.py
. The main arguments are
opt
contains all dataset options, includingdataroot_target
: path of target image folder.dataroot_ref [optional]
: path of reference image folder.meta_info_file
: file containing meta information of images, including relative image paths, mos labels and other labels.augment [optional]
data augmentation transform listhflip
: flip input images or pairsrandom_crop
: int or tuple, random crop input images or pairs
split_file [optional]
:train/val/test
split file*.pkl
. If not specified, will use the split information in meta csv file or load the whole dataset.split_index [optional]
:str
orint
, which split to use, valid whensplit_file
is specified or corresponding split information exits in meta csv file.dmos max
: some dataset use difference of mos. Set this to non-zero will change dmos to mos withmos = dmos_max - dmos
.phase
: phase labels [train, val, test]
The above interface requires the meta_info_file
to provide the dataset information and the train/val/test split. The meta_info_file
are .csv
files, and has the following general format
- For NR datasets: name, mos(mean), std, split_name
```
100.bmp 32.56107532210109 19.12472638223644 train/val/test
```
- For FR datasets: ref_name, dist_name, mos(mean), std, split_name
```
I01.bmp I01_01_1.bmp 5.51429 0.13013 train/val/test
```
Note that we generate train/val/test
splits follow the principles below:
- For datasets which has official splits, we follow their splits.
- For official split which has no
val
part, e.g., AVA dataset, we random separate 5% from training data as validation. - For small datasets which requires n-split results, we use
train:val=8:2
ratio. - All random seeds are set to
123
when needed.
According to these rules, the split_name
is named as follows:
- The official split is saved in a column named
official_split
. - [if necessary] Ten random splits are generated and stored using the format
ratio[split_ratio]_seed[seed number]_split[split index:02d]
. For example, for a split ratio oftrain/val/test=8:0:2
, a seed number of 123, and the first split, the entry would beratio802_seed123_split01
. - You can also use other custom split names, such as the
ILGnet_split
for the AVA dataset.
You may also use the split_file
to specify the split information. The split_file
are .pkl
files which contains the train/val/test
information with python dictionary in the following format:
{
train_index: {
train: [train_index_list]
val: [val_index_list] # blank if no validation split
test: [test_index_list] # blank if no test split
}
}
The train_index starts from 1
. And the sample indexes correspond to the row index of meta_info_file
, starting from 0
. We already generate the files for mainstream public datasets with scripts in folder ./scripts/.
Some of the supported datasets have different label formats and file organizations, and we create specific dataloader for them:
- Live Challenge. The first 7 samples are usually removed in the related works.
- AVA. Different label formats.
- PieAPP. Different label formats.
- BAPPS. Different label formats.
You may use tests/test_datasets.py
to test whether a dataset can be correctly loaded.