Dataset Preparation

Dataset Preparation

Supported Datasets

The following datasets can be loaded with the current codes after downloaded (see example scripts):

FR Dataset	Description	NR Dataset	Description
PIPAL	2AFC	FLIVE(PaQ-2-PiQ)	Tech & Aesthetic
BAPPS	2AFC	SPAQ	Mobile
PieAPP	2AFC	AVA	Aesthetic
KADID-10k		KonIQ-10k(++)
LIVEM		LIVEChallenge
LIVE		PIQ2023	Portrait dataset
TID2013		GFIQA	Face IQA Dataset
TID2008
CSIQ

Please see more details at Awesome Image Quality Assessment

Resources

Here are some other resources to download the dataset:

Our huggingface archive 🤗
Waterloo Bayesian IQA project. [ IQA-Dataset | download links ]

Interface of Dataloader

We create general interfaces for FR and NR datasets in pyiqa/data/general_fr_dataset.py and pyiqa/data/general_nr_dataset.py. The main arguments are

opt contains all dataset options, including
- dataroot_target: path of target image folder.
- dataroot_ref [optional]: path of reference image folder.
- meta_info_file: file containing meta information of images, including relative image paths, mos labels and other labels.
- augment [optional] data augmentation transform list
  - hflip: flip input images or pairs
  - random_crop: int or tuple, random crop input images or pairs
- split_file [optional]: train/val/test split file *.pkl. If not specified, will use the split information in meta csv file or load the whole dataset.
- split_index [optional]: str or int, which split to use, valid when split_file is specified or corresponding split information exits in meta csv file.
- dmos max: some dataset use difference of mos. Set this to non-zero will change dmos to mos with mos = dmos_max - dmos.
- phase: phase labels [train, val, test]

The above interface requires the meta_info_file to provide the dataset information and the train/val/test split. The meta_info_file are .csv files, and has the following general format

- For NR datasets: name, mos(mean), std, split_name
    ```
    100.bmp   	32.56107532210109   	19.12472638223644   train/val/test
    ```

- For FR datasets: ref_name, dist_name, mos(mean), std, split_name 
    ```
    I01.bmp        I01_01_1.bmp   5.51429        0.13013 train/val/test

    ```

Note that we generate train/val/test splits follow the principles below:

For datasets which has official splits, we follow their splits.
For official split which has no val part, e.g., AVA dataset, we random separate 5% from training data as validation.
For small datasets which requires n-split results, we use train:val=8:2 ratio.
All random seeds are set to 123 when needed.

According to these rules, the split_name is named as follows:

The official split is saved in a column named official_split.
[if necessary] Ten random splits are generated and stored using the format ratio[split_ratio]_seed[seed number]_split[split index:02d]. For example, for a split ratio of train/val/test=8:0:2, a seed number of 123, and the first split, the entry would be ratio802_seed123_split01.
You can also use other custom split names, such as the ILGnet_split for the AVA dataset.

Using separate split file

You may also use the split_file to specify the split information. The split_file are .pkl files which contains the train/val/test information with python dictionary in the following format:

{
    train_index: {
        train: [train_index_list]
        val: [val_index_list] # blank if no validation split
        test: [test_index_list] # blank if no test split
    }
}

The train_index starts from 1. And the sample indexes correspond to the row index of meta_info_file, starting from 0. We already generate the files for mainstream public datasets with scripts in folder ./scripts/.

Specific Datasets and Dataloader

Some of the supported datasets have different label formats and file organizations, and we create specific dataloader for them:

Live Challenge. The first 7 samples are usually removed in the related works.
AVA. Different label formats.
PieAPP. Different label formats.
BAPPS. Different label formats.

Test Dataloader

You may use tests/test_datasets.py to test whether a dataset can be correctly loaded.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!