This repository is about AVA dataset processing to train object detection for the person class only.
The AVA dataset can also be used for various tasks, not only person detection. Please, check the official document for more details.
- Make a directory, named
'dataset'
, in the current working directory. - Click here and download the 'ava_v2.2.zip' to the
dataset
. Then, extract the files. - 'ava_v2.2.zip' includes various annotation files.
# Run setup.sh for simplification. ~$ bash setup.sh
Install required python packages with PyPI.
- Run the command like below:
~$ pip install -r requirements.txt
- The 'ava_youtube_download.py' is for downloading YouTube Videos according to the video name lists.
- The 'cut_frames_from_videos.py' will cut frames from the videos and return images but also bbox annotations (annotations are written in 'csv' format).
# Run the code following order below. ~$ python ava_youtube_download.py ~$ python cut_frames_from_video.py
- The AVA v2.2 dataset contains 430 videos split into 235 for training, 64 for validation, and 131 for test.
- The most label information is about human localization and action recognition.
- For achieving the bbox labels of humans, use 'ava_train_v2.2.csv', 'ava_val_v2.2.csv', and 'ava_test_v2.2.txt'.
- In my case, I will use
yolo_v5 format
like in this tutorial.# For example, # class | center x | center y | width | height in .txt file 0 0.002 0.118 0.714 0.977
- In order to get the
YOLO
format annotation, run the code like below:~$ python cvt_annotation_format_csv_to_txt.py
- For checking the results:
~$ python label_test.py
When you start downloading the train
and val
dataset by runing ava_youtube_download.py
, it might take pretty much time for done. So, in my case, I only downloaded and used val
dataset. That's why the trainset downloading and processing code blocks are commented in ava_youtube_download.py
and cut_frames_from_video.py
.
If you want to use all, please cancel these comments for:
loop_Download(unique_trainNames, trainDir_path)
in 'ava_youtube_download.py'loop_getImageFrame(train_list, trainVideo_path, train_imgPath, unique_trainNames, train_labelPath )
in 'cut_frames_from_video.py'
I trained this dataset for person detection task. The custom trained model was uploaded here. You can check how to set up the environments for your custom training there.
[1] AVA Dataset Downloader Script, alainary, github / 동영상 다운받고 프레임 처리하는 방식 참고
[2] AVA dataset / DB 홈피
[3] yolov4-tiny-tflite-for-person-detection, github / 내가 학습한 person detector