Skip to content

Commit a389648

Browse files
committed
Initial Commit_
0 parents  commit a389648

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

52 files changed

+6854
-0
lines changed

.gitignore

Lines changed: 57 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,57 @@
1+
# output dir
2+
output
3+
instant_test_output
4+
inference_test_output
5+
6+
7+
*.png
8+
*.json
9+
*.diff
10+
*.jpg
11+
!/projects/DensePose/doc/images/*.jpg
12+
13+
# compilation and distribution
14+
__pycache__
15+
_ext
16+
*.pyc
17+
*.pyd
18+
*.so
19+
*.dll
20+
*.egg-info/
21+
build/
22+
dist/
23+
wheels/
24+
25+
# pytorch/python/numpy formats
26+
*.pth
27+
*.pkl
28+
*.npy
29+
*.ts
30+
model_ts*.txt
31+
32+
# ipython/jupyter notebooks
33+
*.ipynb
34+
**/.ipynb_checkpoints/
35+
36+
# Editor temporaries
37+
*.swn
38+
*.swo
39+
*.swp
40+
*~
41+
42+
# editor settings
43+
.idea
44+
.vscode
45+
_darcs
46+
47+
# project dirs
48+
/detectron2/model_zoo/configs
49+
/datasets/*
50+
!/datasets/*.md
51+
/projects/*/datasets
52+
/models
53+
/snippet
54+
/refer
55+
/output_*
56+
57+
**/ops/lib*.so.*

README.md

Lines changed: 75 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,75 @@
1+
# GRES: Generalized Referring Expression Segmentation
2+
[![PyTorch](https://img.shields.io/badge/PyTorch-1.11.0-%23EE4C2C.svg?style=&logo=PyTorch&logoColor=white)](https://pytorch.org/)
3+
[![Python](https://img.shields.io/badge/Python-3.7%20|%203.8%20|%203.9-blue.svg?style=&logo=python&logoColor=ffdd54)](https://www.python.org/downloads/)
4+
[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/gres-generalized-referring-expression-1/generalized-referring-expression-segmentation)](https://paperswithcode.com/sota/generalized-referring-expression-segmentation?p=gres-generalized-referring-expression-1)
5+
6+
**[🏠[Project page]](https://henghuiding.github.io/GRES/)**   **[📄[Arxiv]](https://arxiv.org/abs/2306.00968)**   **[🔥[New Dataset]](https://github.com/henghuiding/gRefCOCO)**
7+
8+
This repository contains code for paper [GRES: Generalized Referring Expression Segmentation](https://arxiv.org/abs/2306.00968).
9+
10+
11+
## Installation:
12+
13+
The code is tested under CUDA 11.8, Pytorch 1.11.0 and Detectron2 0.6.
14+
15+
1. Install [detectron2](https://github.com/facebookresearch/detectron2) following the [manual](https://detectron2.readthedocs.io/en/latest/)
16+
2. Run `sh make.sh` under `gres_model/modeling/pixel_decoder/ops`
17+
3. Install other required packages: `pip -r requirements.txt`
18+
4. Prepare the dataset following `datasets/DATASET.md`
19+
20+
## Inference
21+
22+
```
23+
python train_net.py \
24+
--config-file configs/referring_swin_base.yaml \
25+
--num-gpus 8 --dist-url auto --eval-only \
26+
MODEL.WEIGHTS [path_to_weights] \
27+
OUTPUT_DIR [output_dir]
28+
```
29+
30+
## Training
31+
32+
Firstly, download the backbone weights (`swin_base_patch4_window12_384_22k.pkl`) and convert it into detectron2 format using the script:
33+
34+
```
35+
wget https://github.com/SwinTransformer/storage/releases/download/v1.0.0/swin_base_patch4_window12_384_22k.pth
36+
python tools/convert-pretrained-swin-model-to-d2.py swin_base_patch4_window12_384_22k.pth swin_base_patch4_window12_384_22k.pkl
37+
```
38+
39+
Then start training:
40+
```
41+
python train_net.py \
42+
--config-file configs/referring_swin_base.yaml \
43+
--num-gpus 8 --dist-url auto \
44+
MODEL.WEIGHTS [path_to_weights] \
45+
OUTPUT_DIR [path_to_weights]
46+
```
47+
48+
Add your configs subsquently to customize options. For example:
49+
```
50+
SOLVER.IMS_PER_BATCH 48
51+
SOLVER.BASE_LR 0.00001
52+
```
53+
For the full list of base configs, see `configs/referring_R50.yaml` and `configs/Base-COCO-InstanceSegmentation.yaml`
54+
55+
56+
## Models
57+
58+
[Onedrive](https://entuedu-my.sharepoint.com/:u:/g/personal/liuc0058_e_ntu_edu_sg/EbrpMReEP4RBoWEleQOHsPoBk9Ttj5SkxX4NJ1-vwYE-eQ?e=JzovhE)
59+
60+
61+
## Acknowledgement
62+
63+
This project is based on [refer](https://github.com/lichengunc/refer), [maskformer](https://github.com/facebookresearch/Mask2Former), [detectron2](https://github.com/facebookresearch/detectron2). Many thanks to the authors for their great works!
64+
65+
## BibTeX
66+
Please consider to cite GRES if it helps your research.
67+
68+
```latex
69+
@inproceedings{GRES,
70+
title={{GRES}: Generalized Referring Expression Segmentation},
71+
author={Liu, Chang and Ding, Henghui and Jiang, Xudong},
72+
booktitle={CVPR},
73+
year={2023}
74+
}
75+
```
Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,50 @@
1+
MODEL:
2+
BACKBONE:
3+
FREEZE_AT: 0
4+
NAME: "build_resnet_backbone"
5+
WEIGHTS: "detectron2://ImageNetPretrained/torchvision/R-50.pkl"
6+
PIXEL_MEAN: [123.675, 116.280, 103.530]
7+
PIXEL_STD: [58.395, 57.120, 57.375]
8+
RESNETS:
9+
DEPTH: 50
10+
STEM_TYPE: "basic" # not used
11+
STEM_OUT_CHANNELS: 64
12+
STRIDE_IN_1X1: False
13+
OUT_FEATURES: ["res2", "res3", "res4", "res5"]
14+
# NORM: "SyncBN"
15+
RES5_MULTI_GRID: [1, 1, 1] # not used
16+
DATASETS:
17+
REF_ROOT: "refer/data/"
18+
TRAIN: ("refcoco_unc_train",)
19+
TEST: ("refcoco_unc_val",)
20+
REFERRING:
21+
BERT_TYPE: "bert-base-uncased"
22+
SOLVER:
23+
IMS_PER_BATCH: 16
24+
BASE_LR: 0.0001
25+
STEPS: (327778, 355092)
26+
MAX_ITER: 368750
27+
WARMUP_FACTOR: 1.0
28+
WARMUP_ITERS: 10
29+
WEIGHT_DECAY: 0.05
30+
OPTIMIZER: "ADAMW"
31+
BACKBONE_MULTIPLIER: 0.1
32+
CLIP_GRADIENTS:
33+
ENABLED: True
34+
CLIP_TYPE: "full_model"
35+
CLIP_VALUE: 0.01
36+
NORM_TYPE: 2.0
37+
AMP:
38+
ENABLED: True
39+
INPUT:
40+
IMAGE_SIZE: 1024
41+
MIN_SCALE: 0.75
42+
MAX_SCALE: 1.0
43+
FORMAT: "RGB"
44+
DATASET_MAPPER_NAME: "refcoco"
45+
TEST:
46+
EVAL_PERIOD: 5000
47+
DATALOADER:
48+
FILTER_EMPTY_ANNOTATIONS: False
49+
NUM_WORKERS: 4
50+
VERSION: 2

configs/referring_R50.yaml

Lines changed: 57 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,57 @@
1+
_BASE_: Base-COCO-InstanceSegmentation.yaml
2+
MODEL:
3+
META_ARCHITECTURE: "GRES"
4+
SEM_SEG_HEAD:
5+
NAME: "ReferringHead"
6+
IGNORE_VALUE: 255
7+
NUM_CLASSES: 80
8+
LOSS_WEIGHT: 1.0
9+
CONVS_DIM: 256
10+
MASK_DIM: 256
11+
NORM: "GN"
12+
# pixel decoder
13+
PIXEL_DECODER_NAME: "MSDeformAttnPixelDecoder"
14+
IN_FEATURES: ["res2", "res3", "res4", "res5"]
15+
DEFORMABLE_TRANSFORMER_ENCODER_IN_FEATURES: ["res3", "res4", "res5"]
16+
COMMON_STRIDE: 4
17+
TRANSFORMER_ENC_LAYERS: 6
18+
MASK_FORMER:
19+
TRANSFORMER_DECODER_NAME: "MultiScaleMaskedReferringDecoder"
20+
TRANSFORMER_IN_FEATURE: "multi_scale_pixel_decoder"
21+
DEEP_SUPERVISION: True
22+
LANG_ATT_WEIGHT: 0.1
23+
NO_OBJECT_WEIGHT: 0.1
24+
CLASS_WEIGHT: 2.0
25+
MASK_WEIGHT: 5.0
26+
DICE_WEIGHT: 5.0
27+
HIDDEN_DIM: 256
28+
NUM_OBJECT_QUERIES: 100
29+
NHEADS: 8
30+
DROPOUT: 0.0
31+
DIM_FEEDFORWARD: 2048
32+
ENC_LAYERS: 0
33+
PRE_NORM: False
34+
ENFORCE_INPUT_PROJ: False
35+
SIZE_DIVISIBILITY: 32
36+
DEC_LAYERS: 10 # 9 decoder layers, add one for the loss on learnable query
37+
TRAIN_NUM_POINTS: 12544
38+
OVERSAMPLE_RATIO: 3.0
39+
IMPORTANCE_SAMPLE_RATIO: 0.75
40+
TEST:
41+
SEMANTIC_ON: False
42+
INSTANCE_ON: True
43+
PANOPTIC_ON: False
44+
OVERLAP_THRESHOLD: 0.8
45+
OBJECT_MASK_THRESHOLD: 0.8
46+
INPUT:
47+
IMAGE_SIZE: 480
48+
49+
DATASETS:
50+
TRAIN: ("grefcoco_unc_train",)
51+
TEST: ("grefcoco_unc_val",)
52+
53+
SOLVER:
54+
IMS_PER_BATCH: 48
55+
BASE_LR: 0.00001
56+
STEPS: (110000, 210000)
57+
MAX_ITER: 300000

configs/referring_swin_base.yaml

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
_BASE_: referring_R50.yaml
2+
MODEL:
3+
BACKBONE:
4+
NAME: "D2SwinTransformer"
5+
SWIN:
6+
EMBED_DIM: 128
7+
DEPTHS: [2, 2, 18, 2]
8+
NUM_HEADS: [4, 8, 16, 32]
9+
WINDOW_SIZE: 12
10+
APE: False
11+
DROP_PATH_RATE: 0.3
12+
PATCH_NORM: True
13+
PRETRAIN_IMG_SIZE: 384
14+
WEIGHTS: "models/swin_base_patch4_window12_384_22k.pkl"
15+
PIXEL_MEAN: [123.675, 116.280, 103.530]
16+
PIXEL_STD: [58.395, 57.120, 57.375]

configs/referring_swin_tiny.yaml

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
_BASE_: referring_R50.yaml
2+
MODEL:
3+
BACKBONE:
4+
NAME: "D2SwinTransformer"
5+
SWIN:
6+
EMBED_DIM: 96
7+
DEPTHS: [2, 2, 6, 2]
8+
NUM_HEADS: [3, 6, 12, 24]
9+
WINDOW_SIZE: 7
10+
APE: False
11+
DROP_PATH_RATE: 0.3
12+
PATCH_NORM: True
13+
WEIGHTS: "swin_tiny_patch4_window7_224.pkl"
14+
PIXEL_MEAN: [123.675, 116.280, 103.530]
15+
PIXEL_STD: [58.395, 57.120, 57.375]

datasets/DATASET.md

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
## Dataset
2+
3+
The dataset folder should be like this:
4+
5+
```
6+
datasets
7+
├── grefcoco
8+
│ ├── grefs(unc).json
9+
│ ├── instances.json
10+
└── images
11+
└── train2014
12+
├── COCO_train2014_xxxxxxxxxxxx.jpg
13+
├── COCO_train2014_xxxxxxxxxxxx.jpg
14+
└── ...
15+
```
16+
Download our gRefCOCO from here!

0 commit comments

Comments
 (0)