Skip to content

Commit 0f9c79e

Browse files
committed
first commit
0 parents  commit 0f9c79e

File tree

336 files changed

+39549
-0
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

336 files changed

+39549
-0
lines changed

README.md

Lines changed: 102 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,102 @@
1+
# Pytorch implementation of SCL-Domain-Adaptive-Object-Detection
2+
## Introduction
3+
Please follow [faster-rcnn](https://github.com/jwyang/faster-rcnn.pytorch) repository to setup the environment. We used Pytorch 0.4.0 for this project. The different version of pytorch will cause some errors, which have to be handled based on each envirionment.
4+
<br />
5+
For convenience, this repository contains implementation of: <br />
6+
* SCL: Towards Accurate Domain Adaptive Object Detection via Gradient Detach Based Stacked Complementary Losses ([link]())<br />
7+
* Strong-Weak Distribution Alignment for Adaptive Object Detection, CVPR'19 ([link](https://arxiv.org/pdf/1812.04798.pdf)) <br />
8+
* Domain Adaptive Faster R-CNN for Object Detection in the Wild, CVPR'18 (Our re-implementation) ([link](https://arxiv.org/pdf/1803.03243.pdf)) <br />
9+
10+
### Data preparation <br />
11+
We have included the following set of datasets for our implementation: <br />
12+
* **CitysScapes, FoggyCityscapes**: Download website [Cityscapes](https://www.cityscapes-dataset.com/), see dataset preparation code in [DA-Faster RCNN](https://github.com/yuhuayc/da-faster-rcnn/tree/master/prepare_data) <br />
13+
* **Clipart, WaterColor**: Dataset preparation instruction link [Cross Domain Detection](https://github.com/naoto0804/cross-domain-detection/tree/master/datasets). <br />
14+
* **PASCAL_VOC 07+12**: Please follow the instructions in [py-faster-rcnn](https://github.com/rbgirshick/py-faster-rcnn#beyond-the-demo-installation-for-training-and-testing-models) to prepare VOC datasets. <br />
15+
* **Sim10k**: Website [Sim10k](https://fcav.engin.umich.edu/sim-dataset/) <br />
16+
* **Cityscape-Translated Sim10k**: TBA <br />
17+
* **KITTI** - For data prepration please follow [VOD-converter](https://github.com/umautobots/vod-converter) <br />
18+
* **INIT** - Download the dataset from this [website](http://zhiqiangshen.com/projects/INIT/index.html) and data preparation file can be found in this repository in [data preparation folder](https://github.com/harsh-99/SCL-Domain-adaptive-object-detection/tree/master/lib/datasets/data_prep).
19+
20+
It is important to note that we have written all the codes for Pascal VOC format. For example the dataset cityscape is stored as: <br />
21+
22+
```
23+
$ cd cityscape/VOC2012
24+
$ ls
25+
Annotations ImageSets JPEGImages
26+
$ cd ImageSets/Main
27+
$ ls
28+
train.txt val.txt trainval.txt test.txt
29+
```
30+
**Note:** If you want to use this code on your own dataset, please arrange the dataset in the format of PASCAL, make dataset class in *lib/datasets/*, and add it to *lib/datasets/factory.py*, *lib/datasets/config_dataset.py*. Then, add the dataset option to *lib/model/utils/parser_func.py*.
31+
32+
### Data path <br />
33+
Write your dataset directories' paths in lib/datasets/config_dataset.py.
34+
35+
for example
36+
```
37+
__D.CLIPART = "./clipart"
38+
__D.WATER = "./watercolor"
39+
__D.SIM10K = "Sim10k/VOC2012"
40+
__D.SIM10K_CYCLE = "Sim10k_cycle/VOC2012"
41+
__D.CITYSCAPE_CAR = "./cityscape/VOC2007"
42+
__D.CITYSCAPE = "../DA_Detection/cityscape/VOC2007"
43+
__D.FOGGYCITY = "../DA_Detection/foggy/VOC2007"
44+
45+
__D.INIT_SUNNY = "./init_sunny"
46+
__D.INIT_NIGHT = "./init_night"
47+
```
48+
### Pre-trained model <br/>
49+
50+
We used two pre-trained models on ImageNet as backbone for our experiments, VGG16 and ResNet101. You can download these two models from:
51+
52+
* VGG16 - [Dropbox](https://www.dropbox.com/s/s3brpk0bdq60nyb/vgg16_caffe.pth?dl=0), [VT Server](https://filebox.ece.vt.edu/~jw2yang/faster-rcnn/pretrained-base-models/vgg16_caffe.pth) <br />
53+
* Resnet101 - [Dropbox](https://www.dropbox.com/s/iev3tkbz5wyyuz9/resnet101_caffe.pth?dl=0), [VT Server](https://filebox.ece.vt.edu/~jw2yang/faster-rcnn/pretrained-base-models/resnet101_caffe.pth)<br />
54+
55+
To provide their path in the code check __C.VGG_PATH and __C.RESNET_PATH at lib/model/utils/config.py.
56+
<br />
57+
58+
**Our trained model** <br />
59+
We are providing our models for foggycityscapes, watercolor and clipart.<br />
60+
1) Adaptation form cityscapes to foggycityscapes:<br />
61+
* VGG16 - [Google Drive](https://drive.google.com/open?id=1sciKY9BUnmAfSrJdM2vJCeoMzH0XclXl)<br />
62+
* ResNet101 - [Google Drive](https://drive.google.com/open?id=1figzDfm5_8jopD9SP1cJNl0u2MhYaW7o)<br />
63+
2) Adaptation from pascal voc to watercolor:<br />
64+
* Resnet101 - [Google Drive](https://drive.google.com/open?id=11hwlx5Y7Yam0IUuv49lL1qwJxUyf1QW1)<br />
65+
3) Adaptation from pascal voc to clipart:<br />
66+
* Resnet101 - [Google Drive](https://drive.google.com/open?id=1tzYhExZq2jJNmKLGEk8_-2bX_WkLnW22)
67+
68+
### Train
69+
70+
We have provided sample training commands in train_scripts folder. However they are only for implementing our model.<br />
71+
I am providing commands for implementing all three models below.
72+
For SCL: Towards Accurate Domain Adaptive Object Detection via Gradient Detach Based Stacked Complementary Losses:
73+
```
74+
CUDA_VISIBLE_DEVICES=$1 python trainval_net_SCL.py --cuda --net vgg16 --dataset cityscape --dataset_t foggy_cityscape --save_dir $2
75+
```
76+
For Domain Adaptive Faster R-CNN for Object Detection in the Wild -: <br />
77+
```
78+
CUDA_VISIBLE_DEVICES=$1 python trainval_net_dfrcnn.py --cuda --net vgg16 --dataset cityscape --dataset_t foggy_cityscape --save_dir $2
79+
```
80+
For Strong-Weak Distribution Alignment for Adaptive Object Detection -: <br />
81+
```
82+
CUDA_VISIBLE_DEVICES=$1 python trainval_net_global_local.py --cuda --net vgg16 --dataset cityscape --dataset_t foggy_cityscape --gc --lc --save_dir $2
83+
```
84+
85+
### Test
86+
87+
We have provided sample testing commands in test_scripts folder for our model. For others please have a take reference of above training scripts.
88+
89+
### Examples
90+
<div align=center>
91+
<img src="https://user-images.githubusercontent.com/3794909/65907453-66be7900-e392-11e9-996e-daa0d41ee78b.png" width="780">
92+
</div>
93+
<div align=center>
94+
Figure 1: Detection Results from Pascal VOC to Clipart.
95+
</div>
96+
97+
<div align=center>
98+
<img src="https://user-images.githubusercontent.com/3794909/65907605-a71df700-e392-11e9-9f95-18d65ff4ceb7.png" width="780">
99+
</div>
100+
<div align=center>
101+
Figure 2: Detection Results from Pascal VOC to Watercolor.
102+
</div>

_init_paths.py

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
import os.path as osp
2+
import sys
3+
4+
def add_path(path):
5+
if path not in sys.path:
6+
sys.path.insert(0, path)
7+
8+
this_dir = osp.dirname(__file__)
9+
10+
# Add lib to PYTHONPATH
11+
lib_path = osp.join(this_dir, 'lib')
12+
add_path(lib_path)
13+
14+
coco_path = osp.join(this_dir, 'data', 'coco', 'PythonAPI')
15+
add_path(coco_path)
Lines changed: 54 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,54 @@
1+
# Copyright (C) 2017 NVIDIA Corporation. All rights reserved.
2+
# Licensed under the CC BY-NC-SA 4.0 license (https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode).
3+
4+
# logger options
5+
image_save_iter: 1000 # How often do you want to save output images during training
6+
image_display_iter: 100 # How often do you want to display output images during training
7+
display_size: 8 # How many images do you want to display each time
8+
snapshot_save_iter: 10000 # How often do you want to save trained models
9+
log_iter: 1 # How often do you want to log the training stats
10+
11+
# optimization options
12+
max_iter: 1000000 # maximum number of training iterations
13+
batch_size: 1 # batch size
14+
weight_decay: 0.0001 # weight decay
15+
beta1: 0.5 # Adam parameter
16+
beta2: 0.999 # Adam parameter
17+
init: kaiming # initialization [gaussian/kaiming/xavier/orthogonal]
18+
lr: 0.0001 # initial learning rate
19+
lr_policy: step # learning rate scheduler
20+
step_size: 100000 # how often to decay learning rate
21+
gamma: 0.5 # how much to decay learning rate
22+
gan_w: 1 # weight of adversarial loss
23+
recon_x_w: 10 # weight of image reconstruction loss
24+
recon_s_w: 1 # weight of style reconstruction loss
25+
recon_c_w: 1 # weight of content reconstruction loss
26+
recon_x_cyc_w: 0 # weight of explicit style augmented cycle consistency loss
27+
vgg_w: 0 # weight of domain-invariant perceptual loss
28+
29+
# model options
30+
gen:
31+
dim: 64 # number of filters in the bottommost layer
32+
mlp_dim: 256 # number of filters in MLP
33+
style_dim: 8 # length of style code
34+
activ: relu # activation function [relu/lrelu/prelu/selu/tanh]
35+
n_downsample: 4 # number of downsampling layers in content encoder
36+
n_res: 4 # number of residual blocks in content encoder/decoder
37+
pad_type: reflect # padding type [zero/reflect]
38+
dis:
39+
dim: 64 # number of filters in the bottommost layer
40+
norm: none # normalization layer [none/bn/in/ln]
41+
activ: lrelu # activation function [relu/lrelu/prelu/selu/tanh]
42+
n_layer: 4 # number of layers in D
43+
gan_type: lsgan # GAN loss [lsgan/nsgan]
44+
num_scales: 3 # number of scales
45+
pad_type: reflect # padding type [zero/reflect]
46+
47+
# data options
48+
input_dim_a: 3 # number of image channels [1/3]
49+
input_dim_b: 3 # number of image channels [1/3]
50+
num_workers: 8 # number of data loading threads
51+
new_size: 256 # first resize the shortest image side to this size
52+
crop_image_height: 256 # random crop image of this height
53+
crop_image_width: 256 # random crop image of this width
54+
data_root: ./datasets/demo_edges2handbags/ # dataset folder location
Lines changed: 62 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,62 @@
1+
# Copyright (C) 2017 NVIDIA Corporation. All rights reserved.
2+
# Licensed under the CC BY-NC-SA 4.0 license (https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode).
3+
4+
# logger options
5+
image_save_iter: 1000 # How often do you want to save output images during training
6+
image_display_iter: 100 # How often do you want to display output images during training
7+
display_size: 8 # How many images do you want to display each time
8+
snapshot_save_iter: 10000 # How often do you want to save trained models
9+
log_iter: 1 # How often do you want to log the training stats
10+
11+
# optimization options
12+
max_iter: 1000000 # maximum number of training iterations
13+
batch_size: 1 # batch size
14+
weight_decay: 0.0001 # weight decay
15+
beta1: 0.5 # Adam parameter
16+
beta2: 0.999 # Adam parameter
17+
init: kaiming # initialization [gaussian/kaiming/xavier/orthogonal]
18+
lr: 0.0001 # initial learning rate
19+
lr_policy: step # learning rate scheduler
20+
step_size: 100000 # how often to decay learning rate
21+
gamma: 0.5 # how much to decay learning rate
22+
gan_w: 1 # weight of adversarial loss
23+
recon_x_w: 10 # weight of image reconstruction loss
24+
recon_s_w: 1 # weight of style reconstruction loss
25+
recon_c_w: 1 # weight of content reconstruction loss
26+
recon_x_cyc_w: 0 # weight of explicit style augmented cycle consistency loss
27+
vgg_w: 0 # weight of domain-invariant perceptual loss
28+
29+
# model options
30+
gen:
31+
dim: 64 # number of filters in the bottommost layer
32+
mlp_dim: 256 # number of filters in MLP
33+
style_dim: 8 # length of style code
34+
activ: relu # activation function [relu/lrelu/prelu/selu/tanh]
35+
n_downsample: 2 # number of downsampling layers in content encoder
36+
n_res: 4 # number of residual blocks in content encoder/decoder
37+
pad_type: reflect # padding type [zero/reflect]
38+
dis:
39+
dim: 64 # number of filters in the bottommost layer
40+
norm: none # normalization layer [none/bn/in/ln]
41+
activ: lrelu # activation function [relu/lrelu/prelu/selu/tanh]
42+
n_layer: 4 # number of layers in D
43+
gan_type: lsgan # GAN loss [lsgan/nsgan]
44+
num_scales: 3 # number of scales
45+
pad_type: reflect # padding type [zero/reflect]
46+
47+
# data options
48+
input_dim_a: 3 # number of image channels [1/3]
49+
input_dim_b: 3 # number of image channels [1/3]
50+
num_workers: 8 # number of data loading threads
51+
new_size: 256 # first resize the shortest image side to this size
52+
crop_image_height: 256 # random crop image of this height
53+
crop_image_width: 256 # random crop image of this width
54+
55+
data_folder_train_a: ./datasets/demo_edges2handbags/trainA
56+
data_list_train_a: ./datasets/demo_edges2handbags/list_trainA.txt
57+
data_folder_test_a: ./datasets/demo_edges2handbags/testA
58+
data_list_test_a: ./datasets/demo_edges2handbags/list_testA.txt
59+
data_folder_train_b: ./datasets/demo_edges2handbags/trainB
60+
data_list_train_b: ./datasets/demo_edges2handbags/list_trainB.txt
61+
data_folder_test_b: ./datasets/demo_edges2handbags/testB
62+
data_list_test_b: ./datasets/demo_edges2handbags/list_testB.txt
Lines changed: 54 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,54 @@
1+
# Copyright (C) 2017 NVIDIA Corporation. All rights reserved.
2+
# Licensed under the CC BY-NC-SA 4.0 license (https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode).
3+
4+
# logger options
5+
image_save_iter: 10000 # How often do you want to save output images during training
6+
image_display_iter: 500 # How often do you want to display output images during training
7+
display_size: 16 # How many images do you want to display each time
8+
snapshot_save_iter: 10000 # How often do you want to save trained models
9+
log_iter: 10 # How often do you want to log the training stats
10+
11+
# optimization options
12+
max_iter: 1000000 # maximum number of training iterations
13+
batch_size: 1 # batch size
14+
weight_decay: 0.0001 # weight decay
15+
beta1: 0.5 # Adam parameter
16+
beta2: 0.999 # Adam parameter
17+
init: kaiming # initialization [gaussian/kaiming/xavier/orthogonal]
18+
lr: 0.0001 # initial learning rate
19+
lr_policy: step # learning rate scheduler
20+
step_size: 100000 # how often to decay learning rate
21+
gamma: 0.5 # how much to decay learning rate
22+
gan_w: 1 # weight of adversarial loss
23+
recon_x_w: 10 # weight of image reconstruction loss
24+
recon_s_w: 1 # weight of style reconstruction loss
25+
recon_c_w: 1 # weight of content reconstruction loss
26+
recon_x_cyc_w: 0 # weight of explicit style augmented cycle consistency loss
27+
vgg_w: 0 # weight of domain-invariant perceptual loss
28+
29+
# model options
30+
gen:
31+
dim: 64 # number of filters in the bottommost layer
32+
mlp_dim: 256 # number of filters in MLP
33+
style_dim: 8 # length of style code
34+
activ: relu # activation function [relu/lrelu/prelu/selu/tanh]
35+
n_downsample: 2 # number of downsampling layers in content encoder
36+
n_res: 4 # number of residual blocks in content encoder/decoder
37+
pad_type: reflect # padding type [zero/reflect]
38+
dis:
39+
dim: 64 # number of filters in the bottommost layer
40+
norm: none # normalization layer [none/bn/in/ln]
41+
activ: lrelu # activation function [relu/lrelu/prelu/selu/tanh]
42+
n_layer: 4 # number of layers in D
43+
gan_type: lsgan # GAN loss [lsgan/nsgan]
44+
num_scales: 3 # number of scales
45+
pad_type: reflect # padding type [zero/reflect]
46+
47+
# data options
48+
input_dim_a: 3 # number of image channels [1/3]
49+
input_dim_b: 3 # number of image channels [1/3]
50+
num_workers: 8 # number of data loading threads
51+
new_size: 600 # first resize the shortest image side to this size
52+
crop_image_height: 480 # random crop image of this height
53+
crop_image_width: 480 # random crop image of this width
54+
data_root: ./datasets/cityscape/ # dataset folder location

cfgs/configs/foggy2_4.yaml

Lines changed: 54 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,54 @@
1+
# Copyright (C) 2017 NVIDIA Corporation. All rights reserved.
2+
# Licensed under the CC BY-NC-SA 4.0 license (https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode).
3+
#Used for cityscape 2 (foggy2)
4+
# logger options
5+
image_save_iter: 10000 # How often do you want to save output images during training
6+
image_display_iter: 500 # How often do you want to display output images during training
7+
display_size: 16 # How many images do you want to display each time
8+
snapshot_save_iter: 10000 # How often do you want to save trained models
9+
log_iter: 10 # How often do you want to log the training stats
10+
11+
# optimization options
12+
max_iter: 1000000 # maximum number of training iterations
13+
batch_size: 1 # batch size
14+
weight_decay: 0.0001 # weight decay
15+
beta1: 0.5 # Adam parameter
16+
beta2: 0.999 # Adam parameter
17+
init: kaiming # initialization [gaussian/kaiming/xavier/orthogonal]
18+
lr: 0.0001 # initial learning rate
19+
lr_policy: step # learning rate scheduler
20+
step_size: 100000 # how often to decay learning rate
21+
gamma: 0.5 # how much to decay learning rate
22+
gan_w: 1 # weight of adversarial loss
23+
recon_x_w: 10 # weight of image reconstruction loss
24+
recon_s_w: 1 # weight of style reconstruction loss
25+
recon_c_w: 1 # weight of content reconstruction loss
26+
recon_x_cyc_w: 0 # weight of explicit style augmented cycle consistency loss
27+
vgg_w: 0 # weight of domain-invariant perceptual loss
28+
29+
# model options
30+
gen:
31+
dim: 64 # number of filters in the bottommost layer
32+
mlp_dim: 256 # number of filters in MLP
33+
style_dim: 8 # length of style code
34+
activ: relu # activation function [relu/lrelu/prelu/selu/tanh]
35+
n_downsample: 4 # number of downsampling layers in content encoder
36+
n_res: 4 # number of residual blocks in content encoder/decoder
37+
pad_type: reflect # padding type [zero/reflect]
38+
dis:
39+
dim: 64 # number of filters in the bottommost layer
40+
norm: none # normalization layer [none/bn/in/ln]
41+
activ: lrelu # activation function [relu/lrelu/prelu/selu/tanh]
42+
n_layer: 4 # number of layers in D
43+
gan_type: lsgan # GAN loss [lsgan/nsgan]
44+
num_scales: 3 # number of scales
45+
pad_type: reflect # padding type [zero/reflect]
46+
47+
# data options
48+
input_dim_a: 3 # number of image channels [1/3]
49+
input_dim_b: 3 # number of image channels [1/3]
50+
num_workers: 8 # number of data loading threads
51+
new_size: 352 # first resize the shortest image side to this size
52+
crop_image_height: 352 # random crop image of this height
53+
crop_image_width: 352 # random crop image of this width
54+
data_root: ./datasets/cityscape_2/ # dataset folder location

0 commit comments

Comments
 (0)