For ease of implementation, i have not implemented exactly the same as paper.
The things presented below are implemented differently from the paper.
-
Backbone network(Used Xception instead of network mentioned in the paper.)
-
Learning rate schedule(Used
tf.keras.optimizers.schedules.ExponentialDecay
) -
Data augmentations
-
Hyper parameters
-
And so on ...
docker build -t ${ImageName}:${ImageTag} .
- Example
docker run -d -it --gpus all --shm-size=${ShmSize} ${ImageName}:${ImageTag} /bin/bash
Training Dataset: Pascal VOC Dataset (Link)
Pascal VOC Dataset with TFDS
Train | Validation | Test | |
---|---|---|---|
Pascal VOC 2007 | 2501 | 2510 | 4952 (Used Validation) |
Pascal VOC 2012 | 5717 | 5823 | 10991 (No labels) |
- Training Set: VOC2007 trainval + VOC2012 trainval (Total: 16551)
- Validation Set: VOC2007 test (Total: 4952)
Trained with default values of this repo. (Total Epoch: 105)
- Download pb file: <Google Drive Link>
pb file is uploaded as tar.gz
. So, you have to decompress this file like below.
tar -zxvf yolo_voc_448x448.tar.gz
If you want to inference with this pb file, infer to inference_tutorial.ipynb
class name | AP |
---|---|
dog | 0.7464 |
pottedplant | 0.2250 |
car | 0.5021 |
person | 0.4482 |
tvmonitor | 0.5213 |
diningtable | 0.4564 |
bicycle | 0.5927 |
chair | 0.2041 |
motorbike | 0.5595 |
sofa | 0.4801 |
bus | 0.6215 |
boat | 0.3274 |
horse | 0.7049 |
aeroplane | 0.5872 |
sheep | 0.4223 |
bottle | 0.1312 |
train | 0.7917 |
cat | 0.8044 |
bird | 0.4824 |
cow | 0.4548 |
mAP | 0.5032 |
- GPU(GeForce RTX 3090): About 8 FPS
- CPU(AMD Ryzen 5 5600X 6-Core Processor): About 4 FPS
python train_voc.py
Options
Default option values are ./configs/configs.py.
If the options are given, the default config values are overridden.
--epochs
: Number of training epochs--init_lr
: Initial learning rate--lr_decay_rate
: Learning rate decay rate--lr_decay_steps
: Learning rate decay steps--batch_size
: Training batch size--val_step
: Validation interval during training--tb_img_max_outputs
: Number of visualized prediction images in tensorboard--train_ds_sample_ratio
: Training dataset sampling ratio--val_ds_sample_ratio
: Validation dataset sampling ratio
Evaluation pretrained model with VOC2007 Test Dataset
python eval_voc.py
Options
--batch_size
: Evaluation batch size (Default: batch_size of ./configs/configs.py.)--pb_dir
: Save pb directory path (Default:./ckpts/voc_ckpts/yolo_voc_448x448
)
You Only Look Once: Unified, Real-Time Object Detection <arxiv link>
@misc{redmon2016look,
title={You Only Look Once: Unified, Real-Time Object Detection},
author={Joseph Redmon and Santosh Divvala and Ross Girshick and Ali Farhadi},
year={2016},
eprint={1506.02640},
archivePrefix={arXiv},
primaryClass={cs.CV}
}