Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Notes for different padding behaviors when stride=2, padding='SAME' #25

Open
taehoonlee opened this issue Aug 31, 2018 · 0 comments
Open
Labels

Comments

@taehoonlee
Copy link
Owner

taehoonlee commented Aug 31, 2018

The 2x down-sampling is one of the important operations in reference models. But, a convolution or a pooling with stride=2, padding='SAME' may result in different outputs over different deep learning libraries (e.g., TensorFlow, CNTK, Theano, Caffe, Torch, ...) due to their different padding behaviors.

For example (TensorNets syntax; but can be regarded as pseudo codes for other libraries),

x = tf.placeholder(tf.float32, shape=[None, 224, 224, 3])
x = conv(x, 64, 7, stride=2, padding='SAME')

produces a [None, 112, 112, 64] map. This example can be performed as either one of the following cases:

Case 1 (asymmetric)

  • Left-top: kernel_size // 2 - 1, bottom-right: kernel_size // 2
  • The example is working implicitly as:
x = pad(x, [[0, 0], [2, 3], [2, 3], [0, 0]])  # 224 -> 229
x = conv(x, 64, 7, stride=2, padding='VALID')  # 229 -> 112

Case 2 (symmetric)

  • All the corners: kernel_size // 2
  • The example is working implicitly as:
x = pad(x, [[0, 0], [3, 3], [3, 3], [0, 0]])  # 224 -> 230
x = conv(x, 64, 7, stride=2, padding='VALID')  # 230 -> 112 with slicing the the rightmost 1 pixel

As TensorNets translates the original repositories written in various libraries, TensorNets must consider the two padding behaviors to reproduce the original results exactly.

Results

I compared the performance differences of the two padding schemes with the 11 ResNet variants. Precisely, the two for the ResNets are:

# Case 1 (asymmetric)
x = conv2d(x, 64, 7, stride=2, padding='SAME')  # symmetric for ResNet50,101,152v2
...
x = max_pool2d(x, 3, stride=2, padding='SAME')
...

# Case 2 (symmetric)
x = pad(x, [[0, 0], [3, 3], [3, 3], [0, 0]])
x = conv2d(x, 64, 7, stride=2, padding='VALID')
...
x = pad(x, [[0, 0], [1, 1], [1, 1], [0, 0]])
x = max_pool2d(x, 3, stride=2, padding='VALID')
...

The results are summarized as follows:

Top-1 Top-5 10-5 Top-1 Top-5 10-5
Case 1 (asymmetric) Case 2 (symmetric)
ResNet50 25.436 8.098 6.950 25.126 7.982 6.842
ResNet101 24.250 7.402 6.210 23.580 7.214 6.092
ResNet152 23.860 7.098 6.068 23.396 6.882 5.908
ResNet50v2 24.040 6.966 5.896 24.526 7.252 6.012
ResNet101v2 22.766 6.184 5.158 23.116 6.488 5.230
ResNet152v2 21.968 5.838 4.900 22.236 6.080 4.960
ResNet200v2 22.286 6.056 4.902 21.714 5.848 4.830
ResNeXt50c32 22.806 6.460 5.492 22.260 6.190 5.410
ResNeXt101c32 21.660 6.068 4.996 21.270 5.706 4.842
ResNeXt101c64 20.924 5.566 4.666 20.506 5.408 4.564
WideResNet50 22.516 6.344 5.248 21.982 6.066 5.116

All except ResNet50,101,152v2 showed better performances in symmetric padding than asymmetric padding. This is because only TensorFlow (as far as I know) uses asymmetric padding, and only ResNet50,101,152v2 are trained with TensorFlow. Note that

  • ResNet50,101,152 are translated from Caffe,
  • ResNet50,101,152v2 are from TensorFlow,
  • ResNet200v2 is from Torch,
  • ResNeXt50c32,ResNeXt101c32,64 are from PyTorch,
  • WideResNet50 is from Torch.

Caffe uses definitely symmetric padding, and I can infer that (Py)Torch also does so (I'm not familiar with Torch). Thus, in order to reproduce the original results, I revised the current symmetric paddings for the pool1 in the ResNets as asymmetric paddings only in case of ResNet50,101,152v2. As the conv1 of ResNet50,101,152v2 transforms 299 to 150, the SAME padding is equivalent to the symmetric so the conv1/pad didn't be touched. Please see the commit :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant