harsh-99
diff --git a/‎README.md
Lines changed: 102 additions & 0 deletions b/‎README.md
Lines changed: 102 additions & 0 deletions
diff --git a/‎_init_paths.py
Lines changed: 15 additions & 0 deletions b/‎_init_paths.py
Lines changed: 15 additions & 0 deletions
diff --git a/‎cfgs/configs/demo_edges2handbags_folder.yaml
Lines changed: 54 additions & 0 deletions b/‎cfgs/configs/demo_edges2handbags_folder.yaml
Lines changed: 54 additions & 0 deletions
diff --git a/‎cfgs/configs/demo_edges2handbags_list.yaml
Lines changed: 62 additions & 0 deletions b/‎cfgs/configs/demo_edges2handbags_list.yaml
Lines changed: 62 additions & 0 deletions
diff --git a/‎cfgs/configs/edges2handbags_folder.yaml
Lines changed: 54 additions & 0 deletions b/‎cfgs/configs/edges2handbags_folder.yaml
Lines changed: 54 additions & 0 deletions
diff --git a/‎cfgs/configs/foggy2_4.yaml
Lines changed: 54 additions & 0 deletions b/‎cfgs/configs/foggy2_4.yaml
Lines changed: 54 additions & 0 deletions
@@ -0,0 +1,102 @@
+# Pytorch implementation of SCL-Domain-Adaptive-Object-Detection
+## Introduction 
+Please follow [faster-rcnn](https://github.com/jwyang/faster-rcnn.pytorch) repository to setup the environment. We used Pytorch 0.4.0 for this project. The different version of pytorch will cause some errors, which have to be handled based on each envirionment.
+<br />
+For convenience, this repository contains implementation of: <br />
+* SCL: Towards Accurate Domain Adaptive Object Detection via Gradient Detach Based Stacked Complementary Losses ([link]())<br />
+* Strong-Weak Distribution Alignment for Adaptive Object Detection, CVPR'19 ([link](https://arxiv.org/pdf/1812.04798.pdf)) <br />
+* Domain Adaptive Faster R-CNN for Object Detection in the Wild, CVPR'18 (Our re-implementation) ([link](https://arxiv.org/pdf/1803.03243.pdf)) <br />
+
+### Data preparation <br />
+We have included the following set of datasets for our implementation: <br />
+* **CitysScapes, FoggyCityscapes**: Download website [Cityscapes](https://www.cityscapes-dataset.com/), see dataset preparation code in [DA-Faster RCNN](https://github.com/yuhuayc/da-faster-rcnn/tree/master/prepare_data) <br />
+* **Clipart, WaterColor**: Dataset preparation instruction link [Cross Domain Detection](https://github.com/naoto0804/cross-domain-detection/tree/master/datasets). <br />
+* **PASCAL_VOC 07+12**: Please follow the instructions in [py-faster-rcnn](https://github.com/rbgirshick/py-faster-rcnn#beyond-the-demo-installation-for-training-and-testing-models) to prepare VOC datasets. <br />
+* **Sim10k**: Website [Sim10k](https://fcav.engin.umich.edu/sim-dataset/) <br />
+* **Cityscape-Translated Sim10k**: TBA <br />
+* **KITTI** - For data prepration please follow [VOD-converter](https://github.com/umautobots/vod-converter) <br />
+* **INIT** - Download the dataset from this [website](http://zhiqiangshen.com/projects/INIT/index.html) and data preparation file can be found in this repository in [data preparation folder](https://github.com/harsh-99/SCL-Domain-adaptive-object-detection/tree/master/lib/datasets/data_prep).
+
+It is important to note that we have written all the codes for Pascal VOC format. For example the dataset cityscape is stored as: <br />
+
+```
+$ cd cityscape/VOC2012 
+$ ls
+Annotations  ImageSets  JPEGImages
+$ cd ImageSets/Main
+$ ls
+train.txt val.txt trainval.txt test.txt
+```
+**Note:** If you want to use this code on your own dataset, please arrange the dataset in the format of PASCAL, make dataset class in *lib/datasets/*, and add it to *lib/datasets/factory.py*, *lib/datasets/config_dataset.py*. Then, add the dataset option to *lib/model/utils/parser_func.py*.
+
+### Data path <br />
+Write your dataset directories' paths in lib/datasets/config_dataset.py.
+
+for example 
+```
+__D.CLIPART = "./clipart"
+__D.WATER = "./watercolor"
+__D.SIM10K = "Sim10k/VOC2012"
+__D.SIM10K_CYCLE = "Sim10k_cycle/VOC2012"
+__D.CITYSCAPE_CAR = "./cityscape/VOC2007"
+__D.CITYSCAPE = "../DA_Detection/cityscape/VOC2007"
+__D.FOGGYCITY = "../DA_Detection/foggy/VOC2007"
+
+__D.INIT_SUNNY = "./init_sunny"
+__D.INIT_NIGHT = "./init_night"
+```
+### Pre-trained model <br/>
+
+We used two pre-trained models on ImageNet as backbone for our experiments, VGG16 and ResNet101. You can download these two models from:
+
+* VGG16 - [Dropbox](https://www.dropbox.com/s/s3brpk0bdq60nyb/vgg16_caffe.pth?dl=0), [VT Server](https://filebox.ece.vt.edu/~jw2yang/faster-rcnn/pretrained-base-models/vgg16_caffe.pth) <br />
+* Resnet101 - [Dropbox](https://www.dropbox.com/s/iev3tkbz5wyyuz9/resnet101_caffe.pth?dl=0), [VT Server](https://filebox.ece.vt.edu/~jw2yang/faster-rcnn/pretrained-base-models/resnet101_caffe.pth)<br />
+
+To provide their path in the code check __C.VGG_PATH and __C.RESNET_PATH at lib/model/utils/config.py.
+<br />
+
+**Our trained model** <br />
+We are providing our models for foggycityscapes, watercolor and clipart.<br />
+1) Adaptation form cityscapes to foggycityscapes:<br />
+* VGG16 - [Google Drive](https://drive.google.com/open?id=1sciKY9BUnmAfSrJdM2vJCeoMzH0XclXl)<br />
+* ResNet101 - [Google Drive](https://drive.google.com/open?id=1figzDfm5_8jopD9SP1cJNl0u2MhYaW7o)<br />
+2) Adaptation from pascal voc to watercolor:<br />
+* Resnet101 - [Google Drive](https://drive.google.com/open?id=11hwlx5Y7Yam0IUuv49lL1qwJxUyf1QW1)<br />
+3) Adaptation from pascal voc to clipart:<br />
+* Resnet101 - [Google Drive](https://drive.google.com/open?id=1tzYhExZq2jJNmKLGEk8_-2bX_WkLnW22)
+
+### Train
+
+We have provided sample training commands in train_scripts folder. However they are only for implementing our model.<br />
+I am providing commands for implementing all three models below.
+For SCL: Towards Accurate Domain Adaptive Object Detection via Gradient Detach Based Stacked Complementary Losses:
+```
+CUDA_VISIBLE_DEVICES=$1 python trainval_net_SCL.py --cuda --net vgg16 --dataset cityscape --dataset_t foggy_cityscape --save_dir $2
+```
+For Domain Adaptive Faster R-CNN for Object Detection in the Wild -: <br />
+```
+CUDA_VISIBLE_DEVICES=$1 python trainval_net_dfrcnn.py --cuda --net vgg16 --dataset cityscape --dataset_t foggy_cityscape --save_dir $2
+```
+For Strong-Weak Distribution Alignment for Adaptive Object Detection -: <br />
+```
+CUDA_VISIBLE_DEVICES=$1 python trainval_net_global_local.py --cuda --net vgg16 --dataset cityscape --dataset_t foggy_cityscape --gc --lc --save_dir $2
+```
+
+### Test
+
+We have provided sample testing commands in test_scripts folder for our model. For others please have a take reference of above training scripts. 
+
+### Examples
+<div align=center>
+<img src="https://user-images.githubusercontent.com/3794909/65907453-66be7900-e392-11e9-996e-daa0d41ee78b.png" width="780">
+</div>
+ <div align=center>
+Figure 1: Detection Results from Pascal VOC to Clipart.
+</div> 
+ 
+<div align=center>
+<img src="https://user-images.githubusercontent.com/3794909/65907605-a71df700-e392-11e9-9f95-18d65ff4ceb7.png" width="780">
+</div>
+<div align=center>
+Figure 2: Detection Results from Pascal VOC to Watercolor.
+</div> 
@@ -0,0 +1,15 @@
+import os.path as osp
+import sys
+
+def add_path(path):
+    if path not in sys.path:
+        sys.path.insert(0, path)
+
+this_dir = osp.dirname(__file__)
+
+# Add lib to PYTHONPATH
+lib_path = osp.join(this_dir, 'lib')
+add_path(lib_path)
+
+coco_path = osp.join(this_dir, 'data', 'coco', 'PythonAPI')
+add_path(coco_path)
@@ -0,0 +1,54 @@
+# Copyright (C) 2017 NVIDIA Corporation.  All rights reserved.
+# Licensed under the CC BY-NC-SA 4.0 license (https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode).
+
+# logger options
+image_save_iter: 1000         # How often do you want to save output images during training
+image_display_iter: 100       # How often do you want to display output images during training
+display_size: 8               # How many images do you want to display each time
+snapshot_save_iter: 10000     # How often do you want to save trained models
+log_iter: 1                   # How often do you want to log the training stats
+
+# optimization options
+max_iter: 1000000             # maximum number of training iterations
+batch_size: 1                 # batch size
+weight_decay: 0.0001          # weight decay
+beta1: 0.5                    # Adam parameter
+beta2: 0.999                  # Adam parameter
+init: kaiming                 # initialization [gaussian/kaiming/xavier/orthogonal]
+lr: 0.0001                    # initial learning rate
+lr_policy: step               # learning rate scheduler
+step_size: 100000             # how often to decay learning rate
+gamma: 0.5                    # how much to decay learning rate
+gan_w: 1                      # weight of adversarial loss
+recon_x_w: 10                 # weight of image reconstruction loss
+recon_s_w: 1                  # weight of style reconstruction loss
+recon_c_w: 1                  # weight of content reconstruction loss
+recon_x_cyc_w: 0              # weight of explicit style augmented cycle consistency loss
+vgg_w: 0                      # weight of domain-invariant perceptual loss
+
+# model options
+gen:
+  dim: 64                     # number of filters in the bottommost layer
+  mlp_dim: 256                # number of filters in MLP
+  style_dim: 8                # length of style code
+  activ: relu                 # activation function [relu/lrelu/prelu/selu/tanh]
+  n_downsample: 4             # number of downsampling layers in content encoder
+  n_res: 4                    # number of residual blocks in content encoder/decoder
+  pad_type: reflect           # padding type [zero/reflect]
+dis:
+  dim: 64                     # number of filters in the bottommost layer
+  norm: none                  # normalization layer [none/bn/in/ln]
+  activ: lrelu                # activation function [relu/lrelu/prelu/selu/tanh]
+  n_layer: 4                  # number of layers in D
+  gan_type: lsgan             # GAN loss [lsgan/nsgan]
+  num_scales: 3               # number of scales
+  pad_type: reflect           # padding type [zero/reflect]
+
+# data options
+input_dim_a: 3                              # number of image channels [1/3]
+input_dim_b: 3                              # number of image channels [1/3]
+num_workers: 8                              # number of data loading threads
+new_size: 256                               # first resize the shortest image side to this size
+crop_image_height: 256                      # random crop image of this height
+crop_image_width: 256                       # random crop image of this width
+data_root: ./datasets/demo_edges2handbags/     # dataset folder location
@@ -0,0 +1,62 @@
+# Copyright (C) 2017 NVIDIA Corporation.  All rights reserved.
+# Licensed under the CC BY-NC-SA 4.0 license (https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode).
+
+# logger options
+image_save_iter: 1000         # How often do you want to save output images during training
+image_display_iter: 100       # How often do you want to display output images during training
+display_size: 8               # How many images do you want to display each time
+snapshot_save_iter: 10000     # How often do you want to save trained models
+log_iter: 1                   # How often do you want to log the training stats
+
+# optimization options
+max_iter: 1000000             # maximum number of training iterations
+batch_size: 1                 # batch size
+weight_decay: 0.0001          # weight decay
+beta1: 0.5                    # Adam parameter
+beta2: 0.999                  # Adam parameter
+init: kaiming                 # initialization [gaussian/kaiming/xavier/orthogonal]
+lr: 0.0001                    # initial learning rate
+lr_policy: step               # learning rate scheduler
+step_size: 100000             # how often to decay learning rate
+gamma: 0.5                    # how much to decay learning rate
+gan_w: 1                      # weight of adversarial loss
+recon_x_w: 10                 # weight of image reconstruction loss
+recon_s_w: 1                  # weight of style reconstruction loss
+recon_c_w: 1                  # weight of content reconstruction loss
+recon_x_cyc_w: 0              # weight of explicit style augmented cycle consistency loss
+vgg_w: 0                      # weight of domain-invariant perceptual loss
+
+# model options
+gen:
+  dim: 64                     # number of filters in the bottommost layer
+  mlp_dim: 256                # number of filters in MLP
+  style_dim: 8                # length of style code
+  activ: relu                 # activation function [relu/lrelu/prelu/selu/tanh]
+  n_downsample: 2             # number of downsampling layers in content encoder
+  n_res: 4                    # number of residual blocks in content encoder/decoder
+  pad_type: reflect           # padding type [zero/reflect]
+dis:
+  dim: 64                     # number of filters in the bottommost layer
+  norm: none                  # normalization layer [none/bn/in/ln]
+  activ: lrelu                # activation function [relu/lrelu/prelu/selu/tanh]
+  n_layer: 4                  # number of layers in D
+  gan_type: lsgan             # GAN loss [lsgan/nsgan]
+  num_scales: 3               # number of scales
+  pad_type: reflect           # padding type [zero/reflect]
+
+# data options
+input_dim_a: 3                              # number of image channels [1/3]
+input_dim_b: 3                              # number of image channels [1/3]
+num_workers: 8                              # number of data loading threads
+new_size: 256                               # first resize the shortest image side to this size
+crop_image_height: 256                      # random crop image of this height
+crop_image_width: 256                       # random crop image of this width
+
+data_folder_train_a: ./datasets/demo_edges2handbags/trainA
+data_list_train_a: ./datasets/demo_edges2handbags/list_trainA.txt
+data_folder_test_a: ./datasets/demo_edges2handbags/testA
+data_list_test_a: ./datasets/demo_edges2handbags/list_testA.txt
+data_folder_train_b: ./datasets/demo_edges2handbags/trainB
+data_list_train_b: ./datasets/demo_edges2handbags/list_trainB.txt
+data_folder_test_b: ./datasets/demo_edges2handbags/testB
+data_list_test_b: ./datasets/demo_edges2handbags/list_testB.txt
@@ -0,0 +1,54 @@
+# Copyright (C) 2017 NVIDIA Corporation.  All rights reserved.
+# Licensed under the CC BY-NC-SA 4.0 license (https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode).
+
+# logger options
+image_save_iter: 10000        # How often do you want to save output images during training
+image_display_iter: 500       # How often do you want to display output images during training
+display_size: 16              # How many images do you want to display each time
+snapshot_save_iter: 10000     # How often do you want to save trained models
+log_iter: 10                  # How often do you want to log the training stats
+
+# optimization options
+max_iter: 1000000             # maximum number of training iterations
+batch_size: 1                 # batch size
+weight_decay: 0.0001          # weight decay
+beta1: 0.5                    # Adam parameter
+beta2: 0.999                  # Adam parameter
+init: kaiming                 # initialization [gaussian/kaiming/xavier/orthogonal]
+lr: 0.0001                    # initial learning rate
+lr_policy: step               # learning rate scheduler
+step_size: 100000             # how often to decay learning rate
+gamma: 0.5                    # how much to decay learning rate
+gan_w: 1                      # weight of adversarial loss
+recon_x_w: 10                 # weight of image reconstruction loss
+recon_s_w: 1                  # weight of style reconstruction loss
+recon_c_w: 1                  # weight of content reconstruction loss
+recon_x_cyc_w: 0              # weight of explicit style augmented cycle consistency loss
+vgg_w: 0                      # weight of domain-invariant perceptual loss
+
+# model options
+gen:
+  dim: 64                     # number of filters in the bottommost layer
+  mlp_dim: 256                # number of filters in MLP
+  style_dim: 8                # length of style code
+  activ: relu                 # activation function [relu/lrelu/prelu/selu/tanh]
+  n_downsample: 2             # number of downsampling layers in content encoder
+  n_res: 4                    # number of residual blocks in content encoder/decoder
+  pad_type: reflect           # padding type [zero/reflect]
+dis:
+  dim: 64                     # number of filters in the bottommost layer
+  norm: none                  # normalization layer [none/bn/in/ln]
+  activ: lrelu                # activation function [relu/lrelu/prelu/selu/tanh]
+  n_layer: 4                  # number of layers in D
+  gan_type: lsgan             # GAN loss [lsgan/nsgan]
+  num_scales: 3               # number of scales
+  pad_type: reflect           # padding type [zero/reflect]
+
+# data options
+input_dim_a: 3                              # number of image channels [1/3]
+input_dim_b: 3                              # number of image channels [1/3]
+num_workers: 8                              # number of data loading threads
+new_size: 600                               # first resize the shortest image side to this size
+crop_image_height: 480                      # random crop image of this height
+crop_image_width: 480                       # random crop image of this width
+data_root: ./datasets/cityscape/     # dataset folder location
@@ -0,0 +1,54 @@
+# Copyright (C) 2017 NVIDIA Corporation.  All rights reserved.
+# Licensed under the CC BY-NC-SA 4.0 license (https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode).
+#Used for cityscape 2 (foggy2)
+# logger options
+image_save_iter: 10000        # How often do you want to save output images during training
+image_display_iter: 500       # How often do you want to display output images during training
+display_size: 16              # How many images do you want to display each time
+snapshot_save_iter: 10000     # How often do you want to save trained models
+log_iter: 10                  # How often do you want to log the training stats
+
+# optimization options
+max_iter: 1000000             # maximum number of training iterations
+batch_size: 1                 # batch size
+weight_decay: 0.0001          # weight decay
+beta1: 0.5                    # Adam parameter
+beta2: 0.999                  # Adam parameter
+init: kaiming                 # initialization [gaussian/kaiming/xavier/orthogonal]
+lr: 0.0001                    # initial learning rate
+lr_policy: step               # learning rate scheduler
+step_size: 100000             # how often to decay learning rate
+gamma: 0.5                    # how much to decay learning rate
+gan_w: 1                      # weight of adversarial loss
+recon_x_w: 10                 # weight of image reconstruction loss
+recon_s_w: 1                  # weight of style reconstruction loss
+recon_c_w: 1                  # weight of content reconstruction loss
+recon_x_cyc_w: 0              # weight of explicit style augmented cycle consistency loss
+vgg_w: 0                      # weight of domain-invariant perceptual loss
+
+# model options
+gen:
+  dim: 64                     # number of filters in the bottommost layer
+  mlp_dim: 256                # number of filters in MLP
+  style_dim: 8                # length of style code
+  activ: relu                 # activation function [relu/lrelu/prelu/selu/tanh]
+  n_downsample: 4             # number of downsampling layers in content encoder
+  n_res: 4                    # number of residual blocks in content encoder/decoder
+  pad_type: reflect           # padding type [zero/reflect]
+dis:
+  dim: 64                     # number of filters in the bottommost layer
+  norm: none                  # normalization layer [none/bn/in/ln]
+  activ: lrelu                # activation function [relu/lrelu/prelu/selu/tanh]
+  n_layer: 4                  # number of layers in D
+  gan_type: lsgan             # GAN loss [lsgan/nsgan]
+  num_scales: 3               # number of scales
+  pad_type: reflect           # padding type [zero/reflect]
+
+# data options
+input_dim_a: 3                              # number of image channels [1/3]
+input_dim_b: 3                              # number of image channels [1/3]
+num_workers: 8                              # number of data loading threads
+new_size: 352                               # first resize the shortest image side to this size
+crop_image_height: 352                      # random crop image of this height
+crop_image_width: 352                       # random crop image of this width
+data_root: ./datasets/cityscape_2/     # dataset folder location