the dataset should be in the following format:
- A parent directory with your dataset name which consists of the files
labels.csv
andinfo.json
- preprocessed images (128x128) in a folder named
images
inside the parent directory
It should have at least these 3 columns:
- FILE_NAME
- CATEGORY
- SUPER_CATEGORY
(see example datasets)
If you choose different column names, specify them in info.json.
It should have some information about the dataset
- dataset_name
string
- dataset_description
string
- total_categories
integer
- total_super_categories
integer
- uniform_number_of_images_per_category
boolean
- minimum_images_per_category
integer
- median_images_per_category
float
- maximum_images_per_category
integer
- has_super_categories
boolean
- image_column_name
string
- category_column_name
string
- super_category_column_name
string
(see example datasets)
insects/
├── labels.csv
├── info.json
├── images/
├──── insect1.jpg
├──── insect2.jpg
├──── insect3.jpg
With predefined super-category
FILE_NAME, CATEGORY, SUPER_CATEGORY
insect1.jpg, bee, insect
insect2.jpg, wasp, insect
insect3.jpg, butterfly, insect
Without predefined super-category
FILE_NAME, CATEGORY, SUPER_CATEGORY
insect1.jpg, bee, NAN
insect2.jpg, wasp, NAN
insect3.jpg, butterfly, NAN
{
"dataset_name" : "mini_insect_1",
"dataset_description" : "mini insect example dataset # 1",
"total_categories" : 4,
"total_super_categories" : 1,
"uniform_number_of_images_per_category" : true,
"minimum_images_per_category" : 3,
"median_images_per_category" : 3.0,
"maximum_images_per_category" : 3,
"has_super_categories" : true,
"image_column_name" : "FILE_NAME",
"category_column_name" : "CATEGORY",
"super_category_column_name" : "SUPER_CATEGORY",
}
You will find three sample dataset in this repository
mini_insect_1
: mini dataset with Super Categoriesmini_insect_2
: mini dataset without Super Categories
To check if your dataset is correctly formatted before running the Factsheet Script, use the python script check_data_format.py
.
You can run the script in the following way
python check_data_format.py --dataset_path './mini_insect_1'