Notes for different padding behaviors when `stride=2, padding='SAME'` #25

taehoonlee · 2018-08-31T10:24:26Z

The 2x down-sampling is one of the important operations in reference models. But, a convolution or a pooling with stride=2, padding='SAME' may result in different outputs over different deep learning libraries (e.g., TensorFlow, CNTK, Theano, Caffe, Torch, ...) due to their different padding behaviors.

For example (TensorNets syntax; but can be regarded as pseudo codes for other libraries),

x = tf.placeholder(tf.float32, shape=[None, 224, 224, 3])
x = conv(x, 64, 7, stride=2, padding='SAME')

produces a [None, 112, 112, 64] map. This example can be performed as either one of the following cases:

Case 1 (asymmetric)

Left-top: kernel_size // 2 - 1, bottom-right: kernel_size // 2
The example is working implicitly as:

x = pad(x, [[0, 0], [2, 3], [2, 3], [0, 0]])  # 224 -> 229
x = conv(x, 64, 7, stride=2, padding='VALID')  # 229 -> 112

Case 2 (symmetric)

All the corners: kernel_size // 2
The example is working implicitly as:

x = pad(x, [[0, 0], [3, 3], [3, 3], [0, 0]])  # 224 -> 230
x = conv(x, 64, 7, stride=2, padding='VALID')  # 230 -> 112 with slicing the the rightmost 1 pixel

As TensorNets translates the original repositories written in various libraries, TensorNets must consider the two padding behaviors to reproduce the original results exactly.

Results

I compared the performance differences of the two padding schemes with the 11 ResNet variants. Precisely, the two for the ResNets are:

# Case 1 (asymmetric)
x = conv2d(x, 64, 7, stride=2, padding='SAME')  # symmetric for ResNet50,101,152v2
...
x = max_pool2d(x, 3, stride=2, padding='SAME')
...

# Case 2 (symmetric)
x = pad(x, [[0, 0], [3, 3], [3, 3], [0, 0]])
x = conv2d(x, 64, 7, stride=2, padding='VALID')
...
x = pad(x, [[0, 0], [1, 1], [1, 1], [0, 0]])
x = max_pool2d(x, 3, stride=2, padding='VALID')
...

The results are summarized as follows:

	Top-1	Top-5	10-5	Top-1	Top-5	10-5
	Case 1 (asymmetric)			Case 2 (symmetric)
ResNet50	25.436	8.098	6.950	25.126	7.982	6.842
ResNet101	24.250	7.402	6.210	23.580	7.214	6.092
ResNet152	23.860	7.098	6.068	23.396	6.882	5.908
ResNet50v2	24.040	6.966	5.896	24.526	7.252	6.012
ResNet101v2	22.766	6.184	5.158	23.116	6.488	5.230
ResNet152v2	21.968	5.838	4.900	22.236	6.080	4.960
ResNet200v2	22.286	6.056	4.902	21.714	5.848	4.830
ResNeXt50c32	22.806	6.460	5.492	22.260	6.190	5.410
ResNeXt101c32	21.660	6.068	4.996	21.270	5.706	4.842
ResNeXt101c64	20.924	5.566	4.666	20.506	5.408	4.564
WideResNet50	22.516	6.344	5.248	21.982	6.066	5.116

All except ResNet50,101,152v2 showed better performances in symmetric padding than asymmetric padding. This is because only TensorFlow (as far as I know) uses asymmetric padding, and only ResNet50,101,152v2 are trained with TensorFlow. Note that

ResNet50,101,152 are translated from Caffe,
ResNet50,101,152v2 are from TensorFlow,
ResNet200v2 is from Torch,
ResNeXt50c32,ResNeXt101c32,64 are from PyTorch,
WideResNet50 is from Torch.

Caffe uses definitely symmetric padding, and I can infer that (Py)Torch also does so (I'm not familiar with Torch). Thus, in order to reproduce the original results, I revised the current symmetric paddings for the pool1 in the ResNets as asymmetric paddings only in case of ResNet50,101,152v2. As the conv1 of ResNet50,101,152v2 transforms 299 to 150, the SAME padding is equivalent to the symmetric so the conv1/pad didn't be touched. Please see the commit :)

The text was updated successfully, but these errors were encountered:

taehoonlee mentioned this issue Aug 31, 2018

Output shape discrepancy between keras.applications.ResNet50 and original paper keras-team/keras-applications#30

Closed

taehoonlee added the note label Oct 29, 2018

taehoonlee mentioned this issue Oct 8, 2019

EfficientNet Implementation keras-team/keras-applications#113

Merged

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Notes for different padding behaviors when `stride=2, padding='SAME'` #25

Notes for different padding behaviors when `stride=2, padding='SAME'` #25

taehoonlee commented Aug 31, 2018 •

edited

Notes for different padding behaviors when stride=2, padding='SAME' #25

Notes for different padding behaviors when stride=2, padding='SAME' #25

Comments

taehoonlee commented Aug 31, 2018 • edited

Case 1 (asymmetric)

Case 2 (symmetric)

Results

Notes for different padding behaviors when `stride=2, padding='SAME'` #25

Notes for different padding behaviors when `stride=2, padding='SAME'` #25

taehoonlee commented Aug 31, 2018 •

edited