Skip to content

paniabhisek/maxout

Repository files navigation

This is an attempt to replicate the following paper as the hyperparameter link is not working in the paper.

arXiv:1302.4389 [stat.ML]

Dataset and Device Info

The following diagram shows the maxout module with multilayer perceptrons.

maxout-mlp.png

MLP + Dropout

How to Run

  • Train: (first 50000 training data) - python mnist.py --mlp 1 --train true
  • Validation: (remaining 10000 training data) - python mnist.py --mlp 1 --valid true
  • Train Continuation: (whole train data, continue from previous training) - python mnist.py --mlp 1 --train_cont true
  • Testing: python mnist.py --mlp 1 --test true

For complete hyperparameter tuning check hyper-tuning.rst file.

  • Learning rate: 0.005

Training

Epochs Batch size Layer1 Layer2
Accuracy
(%)
Loss
Number of
layers
Number of
Neurons
Number of
layers
Number of
Neurons
5 64 4 2048 2 10 97.79 1.5060
5 64 4 1024 2 10 97.44 1.5107

Validation

Training
Epochs
Batch size Layer1 Layer2
Accuracy
(%)
Loss
Number of
layers
Number of
Neurons
Number of
layers
Number of
Neurons
5 64 4 2048 2 10 96.94 1.5097
5 64 4 1024 2 10 96.83 1.5108

It has been trained further with whole training dataset with the following accuracies and loss.

Training with pretrained weights

Epochs Batch size Layer1 Layer2
Accuracy
(%)
Loss
Number of
layers
Number of
Neurons
Number of
layers
Number of
Neurons
5 64 4 2048 2 10 99.02 1.4827

Testing

Batch size Layer1 Layer2
Accuracy
(%)
Loss
Number of
layers
Number of
Neurons
Number of
layers
Number of
Neurons
64 4 2048 2 10 97.17 1.5007

3 Conv + MLP

maxout-conv.png

How to Run

  • Train: (50000 shuffled training data) - python mnist.py --conv 1 --train true
  • Validation: (remaining 10000 training data) - python mnist.py --conv 1 --valid true
  • Train Continuation: (whole train data, continue from previous training) - python mnist.py --conv 1 --train_cont true
  • Testing: python mnist.py --conv 1 --test true

Learning Rate

First learning rate is set to 0.01. Then it is halved at epoch 5 for training of 50000 shuffled data. With least error for validation, it is retrained with the pretrained weights. But this time the starting learning rate is 0.001, it is halved at epoch 5.


The architecture presented in paper is as follows: conv -> maxpool -> conv -> maxpool -> conv -> maxpool -> MLP -> softmax. It is evident that the output of MLP is 10 and the input of MLP is whatever number comes from 3rd maxpool. Only I had to adjust was kernels, paddings of convolutional layers. Because those are the only parameters in the network.

Training

Epochs Batch Conv1 Maxpool1 Conv2 Maxpool2 Conv3 Maxpool3 MLP Acc % Loss
kernel pad pool stride kernel pad pool stride kernel pad pool stride in out
10 64 7 x 7 3 2 x 2 1 5 x 5 2 2 x 2 1 5 x 5 2 2 x 2 1 625 10 97.09 1.4921
10 64 5 x 5 3 2 x 2 1 5 x 5 2 2 x 2 1 5 x 5 2 2 x 2 1 729 10 87.62 1.5856
10 64 5 x 5 3 2 x 2 1 3 x 3 2 2 x 2 1 3 x 3 2 2 x 2 1 961 10 95.43 1.5088
10 64 5 x 5 2 2 x 2 1 3 x 3 2 2 x 2 1 3 x 3 2 2 x 2 1 841 10 95.96 1.5037

Validation

Batch Conv1 Maxpool1 Conv2 Maxpool2 Conv3 Maxpool3 MLP Acc % Loss
kernel pad pool stride kernel pad pool stride kernel pad pool stride in out
64 7 x 7 3 2 x 2 1 5 x 5 2 2 x 2 1 5 x 5 2 2 x 2 1 625 10 96.85 1.4928
64 5 x 5 3 2 x 2 1 5 x 5 2 2 x 2 1 5 x 5 2 2 x 2 1 729 10 87.76 1.5828
64 5 x 5 3 2 x 2 1 3 x 3 2 2 x 2 1 3 x 3 2 2 x 2 1 961 10 95.16 1.5828
64 5 x 5 2 2 x 2 1 3 x 3 2 2 x 2 1 3 x 3 2 2 x 2 1 841 10 96.15 1.5012

Training Continuation

Epochs Batch Conv1 Maxpool1 Conv2 Maxpool2 Conv3 Maxpool3 MLP Acc % Loss
kernel pad pool stride kernel pad pool stride kernel pad pool stride in out
10 64 7 x 7 3 2 x 2 1 5 x 5 2 2 1 5 x 5 2 2 1 625 10 97.58 1.4874
10 64 5 x 5 3 2 x 2 1 5 x 5 2 2 1 5 x 5 2 2 1 729 10 88.04 1.5811
10 64 5 x 5 3 2 x 2 1 3 x 3 2 2 1 3 x 3 2 2 1 961 10 96.25 1.5011
10 64 5 x 5 2 2 x 2 1 3 x 3 2 2 1 3 x 3 2 2 1 841 10 96.75 1.4960

Testing

Batch Conv1 Maxpool1 Conv2 Maxpool2 Conv3 Maxpool3 MLP Acc % Loss
kernel pad pool stride kernel pad pool stride kernel pad pool stride in out
64 7 x 7 3 2 x 2 1 5 x 5 2 2 x 2 1 5 x 5 2 2 x 2 1 625 10 96.87 1.4929
64 5 x 5 3 2 x 2 1 5 x 5 2 2 x 2 1 5 x 5 2 2 x 2 1 729 10 87.39 1.5861
64 5 x 5 3 2 x 2 1 3 x 3 2 2 x 2 1 3 x 3 2 2 x 2 1 961 10 95.52 1.5070
64 5 x 5 2 2 x 2 1 3 x 3 2 2 x 2 1 3 x 3 2 2 x 2 1 841 10 96.30 1.4989

Releases

No releases published

Packages

No packages published

Languages