Deep learning using TensorFlow low-level APIs.
Build your own convolutional neural networks using TensorFlow.
Supports image classification and semantic segmentation tasks.
DCGAN is now available.
Post-training quantization is now supported.
Verified on Windows 10 and Ubuntu 18.04 using PyCharm with Anaconda.
Check out the instruction.
VGGNet demo: Edit classification/parameters_vgg.py and run classification/demo_vgg.py
- Download all the files.
- Prepare your data using scripts in subsets/.
- Build your own networks by modifying scripts in models/.
- Edit parameters.py to change the dataset, model, directories, etc...
- Run train.py to train the model.
- Run test.py to test the trained model.
- Use inference.py if you have no label for test data.
- Run quantize.py to perform post-training quantization (pydot package required).
Images and labels should be paired and stored in the same directory (default).
- Open terminal and cd to MyConvNet.
- Download, extract, and process datasets:
- CUB-200-2011: http://www.vision.caltech.edu/visipedia/CUB-200-2011.html
- python -m subsets.cub_200_2011 --data /path/to/raw/data --dest /path/to/processed/data
- ImageNet: http://image-net.org/challenges/LSVRC/2012/downloads (log-in required)
- python -m subsets.ilsvrc_2012_cls --data /path/to/raw/data --dest /path/to/processed/data
- CUB-200-2011: http://www.vision.caltech.edu/visipedia/CUB-200-2011.html
- And so on.
Some scripts may not support command-line execution.
- If you have no NVIDIA GPU, set 'num_gpus' parameter to 0 in order to utilize a CPU for training/inference.
- Our RandomResizedCrop performs padding prior to cropping so that (each side of an image) ≥ √(max_scale·H·W).
- Set padding=False for random_resized_crop() to use RandomResizedCrop without padding.
- We are doing an experiment inspired by "Fixing the train-test resolution discrepancy".
- Which is extending the RandomResizedCrop scale range from [0.08, 1.0] to [0.04, 1.96] (includes padding).
- In the segmentation task, pixels with a value of 0 are ignored, so assign 1 to the first class.
- Use Linux for faster training.
- Multi-GPU training is available based on the parameter server strategy.
- NCCL-based distributed training code is curruntly not available (nccl/).
- Batch statistics of multiple devices are updated successively.
- Check out REFERENCES.md for papers and code references.
- Python: 3.7
- tensorflow-gpu: >= 1.14.0 (cudatoolkit: 10.0, cudnn: 7.6.5)
- numpy: 1.17.4
- scikit-image: 0.15.0
- scikit-learn: 0.22
- matplotlib: 3.1.1
- opencv-python: 4.1.2.30 (installed with pip)
- pydot: 1.4.1 (graphviz: 2.40.1)
- Speedup: Training is slower than tf_cnn_benchmark.
- Object detection task.
- Multi-model optimization including knowledge distillation.
Model | Top-1 Acc | Top-5 Acc | Train (Test) Image/Input Size | Details | Param | Ckpt |
---|---|---|---|---|---|---|
ResNet-v1.5-50 | 76.35% | 92.94% | 224/224 (256/224) | Inception preprocessing (baseline) | *.py | *.zip |
ResNet-v1.5-50 | 76.50% | 93.06% | 224/224 (256/224) | + 30 epochs (120 in total) | *.py | *.zip |
ResNet-v1.5-50 | 77.02% | 93.24% | 224/224 (256/224) | + Cosine LR, decoupled WD 4e-5, dropout 0.3 | *.py | *.zip |
ResNet-v1.5-50 | 77.51% | 93.80% | 224/224 (256†/224) | + Extended crop scale [0.08, 1.0] -> [0.04, 1.96] |
*.py | *.zip |
Efficient Net-B0 |
76.82% | 93.21% | 224/224 (256/224) | Baseline (terminated at epoch 330 due to instability) | *.py | *.zip |
Efficient Net-B0 |
77.01% | 93.42% | 224/224 (256†/224) | + 30 epochs (380 in total), extended crop scale |
*.py | *.zip |
Efficient Net-Lite0 |
75.36% (75.22%‡) |
92.63% (92.43%‡) |
224/224 (256†/224) | 380 epochs, extended crop scale | *.py | *.zip |
Efficient Net-Lite0 |
75.62% (75.16%‡) |
92.62% (92.33%‡) |
224/224 (256†/224) | kernel_size=4 for stride=2 convolutions | *.py | *.zip |
- The reported accuracies are single-crop validation scores.
- Note that the class numbers are ordered by the synset IDs (train.txt, val.txt). Refer to ilsvrc_2012_cls.py and this page.
- Therefore, the class ordering is different from the one in the devkit.
- Image size refers to the size after preprocessing and input size is about networks' inputs.
- If image and input sizes do not match, cropping or padding is performed.
- Training scores are calculated with augmentation and validation is performed with exponential moving average (EMA).
- As a result, validation scores can surpass training scores in the training curves.
- EMA is known to play a crucial role in training EfficientNet.
- † Crop method is slightly different, which is center crop of a √(HW) by √(HW) region, zero padding, and resize.
- ‡ Accuracy after post-training quantization.