Skip to content

An extension of deeplab-v2 (in TF) allowing for smoothed dilated convolutions


Notifications You must be signed in to change notification settings


Folders and files

Last commit message
Last commit date

Latest commit


Repository files navigation

Efficient Smoothing of Dilated Convolutions for Image Segmentation

This is the code for reproducing the experiments of our project Efficient Smoothing of Dilated Convolutions for Image Segmentation. The code is based on the source code of the paper Smoothed Dilated Convolutions for Improved Dense Prediction, see original README below.


  • Konstantin Donhauser [donhausk at]
  • Manuel Fritsche [manuelf at]
  • Lorenz Kuhn [kuhnl at]
  • Thomas Ziegler [zieglert at]

Changes compare to source Repo

  • Addition of our proposed pre-filters:

    We added our proposed pre-filters (Average, Gaussian, and an Aggregation of them) in the file. Small changes also in the and file

  • Add training/validation iterations

    Add the possibility for cyclig training and validation. The model is trained for a certain number of steps, afterwards the validation is performed. This will be iterated until the defined number of iterations is reached. Changes made in and changes as well as new configuration parameter in

  • Extend the logging to Tensorboard

    During training and at the end of each validation the IoU per class and the mean IoU (mIoU) are logged to be viewd in TensorBoard. Changes performed in

  • Scripts for easy using on ETHZ's Leonhard cluster

    With the command source one can set the parameters for the PASCAL VOC 2012 dataset.

    With the command source cityscapes one can set the parameters for the Cityscapes dataset.

    With the command sh one can start the training/validation. (Ensure that the datasets are located at the path defined in

    With the command sh one can clean the workspace after an training/validation. All important log files will be combined in a tar file.

How to run

  • Clone this Repo

  • Download the needed models and data sets

    The VOC_and_refModel file contains the augmented PASCAL VOC 2012 dataset and the pre-trained deeplabv2 models.

    The Cityscapes file contains the Cityscapes dataset.

  • Select desired pre-filter

    Select desired pre-filter in Default filter is our proposed Average filter. If one chooses Aggregation the batch size has to be reduced in the file. Furthermore make sure that the files of the pre-trained models are located as defined in the file (pretrain_file and checkpoint_file parameter).

  • Source your workspace

    Use the script to source the workspace. Make sure that either the datasets are at the locations specified in the file or change the file accordingly. Use command source or source cityscapes.

  • Start the training

    With sh one can start to train the model. For experiments on the Cityscapes dataset the time limits in the bsub command need to be increased e.g. -W 92:00. Change file accordingly. (If one does not run on ETHZ's Leonhard cluster one can start the training with python --option=train_test. One might want to clean the folder beforehand as its done in the file.)

    During the training the current loss and mIoU for each training step is written to the log.txt file. This helps to detect numerical instabilities (e.g. exploding of the loss) early. The validation results after each iteration are also written into the log.txt file.

  • Collect results

    After the training is finished one can collect all relevant files by running sh The collection is moved to the home folder ~/.

README of the source Repo

Smoothed Dilated Convolutions for Improved Dense Prediction

This is the code for reproducing experimental results in our paper Smoothed Dilated Convolutions for Improved Dense Prediction accepted for long presentation in KDD2018.

Created by Zhengyang Wang and Shuiwang Ji at Texas A&M University.

In this work, we propose smoothed dilated convolutions to address the gridding artifacts caused by dilated convolutions. Some results are shown below. Our methods improve the image semantic segmentation models, with only hundreds of extra training parameters. More details and experimental results will be added once the paper is published.


If using this code , please cite our paper.

 title={Smoothed Dilated Convolutions for Improved Dense Prediction},
 author={Wang, Zhengyang and Ji, Shuiwang},
 booktitle={Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery \& Data Mining},


PASCAL mIoU: model

We perform the effective receptive field analysis to visualize the smoothing effect.

Effective Receptive Field Analysis:

model f


The baseline is an (re-)implementation of DeepLab v2 (ResNet-101) in TensorFlow for semantic image segmentation on the PASCAL VOC 2012 dataset. We refer to DrSleep's implementation (Many thanks!). We do not use tf-to-caffe packages like kaffe so you only need TensorFlow 1.3.0+ to run this code.

The deeplab pre-trained ResNet-101 ckpt files (pre-trained on MSCOCO) are provided by DrSleep -- here. Thanks again!



  • We implement our proposed smoothed dilated convolutions and insert them in the baseline. To use them, simply change 'dilated_type' in


  • A clarification:

As reported, ResNet pre-trained models (NOT deeplab) from Tensorflow were trained using the channel order RGB instead BGR (

Thus, the most correct way to apply them is to use the same order RGB. The original code is for pre-trained models from Caffe and uses BGR. To correct this, when you use res101 and res50, you need to delete line 116 and line 117 in utils/ to remove the RGB to BGR step when reading images. Then, modify line 77 in utils/ to remove the BGR to RGB step in the inverse process for image visualization. At last, you need to change the IMAGE_MEAN by swapping the first and the third values in line 26 and line 26 for non_msc and msc training, respectively.

However, this change actually does not affect the performance a lot, proved by discussion in issue 30. In this task, the size of training patches is different from that in ImageNet. And the set of images is different. The IMAGE_MEAN is never accurate. I guess that simply using IMAGE_MEAN=[127.5, 127.5, 127.5] will work as well.


  • Now the test code will output the mIoU as well as the IoU for each class.


  • Add 'predict' function, you can use '--option=predict' to save your outputs now (both the true prediction where each pixel is between 0 and 20 and the visual one where each class has its own color).

  • Add multi-scale training, testing and predicting. Check and and use them just as and

  • Add to use the log.txt to make plots of training curve.

  • Now this is a 'full' (re-)implementation of DeepLab v2 (ResNet-101) in TensorFlow. Thank you for the support. You are welcome to report your settings and results as well as any bug!


  • The new version enables using original ImageNet pre-trained ResNet models (without pre-training on MSCOCO). You may change arguments ('encoder_name' and 'pretrain_file') in to use corresponding pre-trained models. The original pre-trained ResNet-101 ckpt files are provided by tensorflow officially -- res101 and res50.

  • To help those who want to use this model on the CityScapes dataset, I shared the corresponding txt files and the python file which generates them. Note that you need to use tools here to generate labels with trainID first. Hope it would be helpful. Do not forget to change IMG_MEAN in and other settings in

  • 'is_training' argument is removed and 'self._batch_norm' changes. Basically, for a small batch size, it is better to keep the statistics of the BN layers (running means and variances) frozen, and to not update the values provided by the pre-trained model by setting 'is_training=False'. Note that is_training=False still updates BN parameters gamma (scale) and beta (offset) if they are presented in var_list of the optimiser definition. Set 'trainable=False' in BN fuctions to remove them from trainable_variables.

  • Add 'phase' argument in for future development. 'phase=True' means training. It is mainly for controlling batch normalization (if any) in the non-pre-trained part.

Example: If you have a batch normalization layer in the decoder, you should use 

outputs = self._batch_norm(inputs, name='g_bn1', is_training=self.phase, activation_fn=tf.nn.relu, trainable=True)
  • Some changes to make the code more readable and easy to modify for future research.

  • I plan to add 'predict' function to enable saving predicted results for offline evaluation, post-processing, etc.

System requirement

Programming language

Python 3.5

Python Packages

tensorflow-gpu 1.3.0

Configure the network

All network hyperparameters are configured in


num_steps: how many iterations to train

save_interval: how many steps to save the model

random_seed: random seed for tensorflow

weight_decay: l2 regularization parameter

learning_rate: initial learning rate

power: parameter for poly learning rate

momentum: momentum

encoder_name: name of pre-trained model: res101, res50 or deeplab

pretrain_file: the initial pre-trained model file for transfer learning

dilated_type: type of dilated conv: regular, decompose, smooth_GI or smooth_SSC

data_list: training data list file


valid_step: checkpoint number for testing/validation

valid_num_steps: = number of testing/validation samples

valid_data_list: testing/validation data list file


out_dir: directory for saving prediction outputs

test_step: checkpoint number for prediction

test_num_steps: = number of prediction samples

test_data_list: prediction data list filename

visual: whether to save visualizable prediction outputs


data_dir: data directory

batch_size: training batch size

input height: height of input image

input width: width of input image

num_classes: number of classes

ignore_label: label pixel value that should be ignored

random_scale: whether to perform random scaling data-augmentation

random_mirror: whether to perform random left-right flipping data-augmentation


modeldir: where to store saved models

logfile: where to store training log

logdir: where to store log for tensorboard

Training and Testing

Start training

After configuring the network, we can start to train. Run


The training of Deeplab v2 ResNet will start.

Training process visualization

We employ tensorboard for visualization.

tensorboard --logdir=log --port=6006

You may visualize the graph of the model and (training images + groud truth labels + predicted labels).

To visualize the training loss curve, write your own script to make use of the training log.

Testing and prediction

Select a checkpoint to test/validate your model in terms of pixel accuracy and mean IoU.

Fill the valid_step in with the checkpoint you want to test. Change valid_num_steps and valid_data_list accordingly. Run

python --option=test

The final output includes pixel accuracy and mean IoU.


python --option=predict

The outputs will be saved in the 'output' folder.


An extension of deeplab-v2 (in TF) allowing for smoothed dilated convolutions







No releases published


No packages published

Contributors 3
