NTUCver
diff --git a/‎.gitignore
Lines changed: 57 additions & 0 deletions b/‎.gitignore
Lines changed: 57 additions & 0 deletions
diff --git a/‎README.md
Lines changed: 75 additions & 0 deletions b/‎README.md
Lines changed: 75 additions & 0 deletions
diff --git a/‎configs/Base-COCO-InstanceSegmentation.yaml
Lines changed: 50 additions & 0 deletions b/‎configs/Base-COCO-InstanceSegmentation.yaml
Lines changed: 50 additions & 0 deletions
diff --git a/‎configs/referring_R50.yaml
Lines changed: 57 additions & 0 deletions b/‎configs/referring_R50.yaml
Lines changed: 57 additions & 0 deletions
diff --git a/‎configs/referring_swin_base.yaml
Lines changed: 16 additions & 0 deletions b/‎configs/referring_swin_base.yaml
Lines changed: 16 additions & 0 deletions
diff --git a/‎configs/referring_swin_tiny.yaml
Lines changed: 15 additions & 0 deletions b/‎configs/referring_swin_tiny.yaml
Lines changed: 15 additions & 0 deletions
diff --git a/‎datasets/DATASET.md
Lines changed: 16 additions & 0 deletions b/‎datasets/DATASET.md
Lines changed: 16 additions & 0 deletions
@@ -0,0 +1,57 @@
+# output dir
+output
+instant_test_output
+inference_test_output
+
+
+*.png
+*.json
+*.diff
+*.jpg
+!/projects/DensePose/doc/images/*.jpg
+
+# compilation and distribution
+__pycache__
+_ext
+*.pyc
+*.pyd
+*.so
+*.dll
+*.egg-info/
+build/
+dist/
+wheels/
+
+# pytorch/python/numpy formats
+*.pth
+*.pkl
+*.npy
+*.ts
+model_ts*.txt
+
+# ipython/jupyter notebooks
+*.ipynb
+**/.ipynb_checkpoints/
+
+# Editor temporaries
+*.swn
+*.swo
+*.swp
+*~
+
+# editor settings
+.idea
+.vscode
+_darcs
+
+# project dirs
+/detectron2/model_zoo/configs
+/datasets/*
+!/datasets/*.md
+/projects/*/datasets
+/models
+/snippet
+/refer
+/output_*
+
+**/ops/lib*.so.*
@@ -0,0 +1,75 @@
+# GRES: Generalized Referring Expression Segmentation
+[![PyTorch](https://img.shields.io/badge/PyTorch-1.11.0-%23EE4C2C.svg?style=&logo=PyTorch&logoColor=white)](https://pytorch.org/)
+[![Python](https://img.shields.io/badge/Python-3.7%20|%203.8%20|%203.9-blue.svg?style=&logo=python&logoColor=ffdd54)](https://www.python.org/downloads/)
+[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/gres-generalized-referring-expression-1/generalized-referring-expression-segmentation)](https://paperswithcode.com/sota/generalized-referring-expression-segmentation?p=gres-generalized-referring-expression-1)
+
+**[🏠[Project page]](https://henghuiding.github.io/GRES/)** &emsp; **[📄[Arxiv]](https://arxiv.org/abs/2306.00968)**  &emsp; **[🔥[New Dataset]](https://github.com/henghuiding/gRefCOCO)**
+
+This repository contains code for paper [GRES: Generalized Referring Expression Segmentation](https://arxiv.org/abs/2306.00968).
+
+
+## Installation:
+
+The code is tested under CUDA 11.8, Pytorch 1.11.0 and Detectron2 0.6.
+
+1. Install [detectron2](https://github.com/facebookresearch/detectron2) following the [manual](https://detectron2.readthedocs.io/en/latest/)
+2. Run `sh make.sh` under `gres_model/modeling/pixel_decoder/ops`
+3. Install other required packages: `pip -r requirements.txt`
+4. Prepare the dataset following `datasets/DATASET.md`
+
+## Inference
+
+```
+python train_net.py \
+    --config-file configs/referring_swin_base.yaml \
+    --num-gpus 8 --dist-url auto --eval-only \
+    MODEL.WEIGHTS [path_to_weights] \
+    OUTPUT_DIR [output_dir]
+```
+
+## Training
+
+Firstly, download the backbone weights (`swin_base_patch4_window12_384_22k.pkl`) and convert it into detectron2 format using the script:
+
+```
+wget https://github.com/SwinTransformer/storage/releases/download/v1.0.0/swin_base_patch4_window12_384_22k.pth
+python tools/convert-pretrained-swin-model-to-d2.py swin_base_patch4_window12_384_22k.pth swin_base_patch4_window12_384_22k.pkl
+```
+
+Then start training:
+```
+python train_net.py \
+    --config-file configs/referring_swin_base.yaml \
+    --num-gpus 8 --dist-url auto \
+    MODEL.WEIGHTS [path_to_weights] \
+    OUTPUT_DIR [path_to_weights]
+```
+
+Add your configs subsquently to customize options. For example: 
+```
+SOLVER.IMS_PER_BATCH 48 
+SOLVER.BASE_LR 0.00001 
+```
+For the full list of base configs, see `configs/referring_R50.yaml` and `configs/Base-COCO-InstanceSegmentation.yaml`
+
+
+## Models
+
+[Onedrive](https://entuedu-my.sharepoint.com/:u:/g/personal/liuc0058_e_ntu_edu_sg/EbrpMReEP4RBoWEleQOHsPoBk9Ttj5SkxX4NJ1-vwYE-eQ?e=JzovhE)
+
+
+## Acknowledgement
+
+This project is based on [refer](https://github.com/lichengunc/refer), [maskformer](https://github.com/facebookresearch/Mask2Former), [detectron2](https://github.com/facebookresearch/detectron2). Many thanks to the authors for their great works!
+
+## BibTeX
+Please consider to cite GRES if it helps your research.
+
+```latex
+@inproceedings{GRES,
+  title={{GRES}: Generalized Referring Expression Segmentation},
+  author={Liu, Chang and Ding, Henghui and Jiang, Xudong},
+  booktitle={CVPR},
+  year={2023}
+}
+```
@@ -0,0 +1,50 @@
+MODEL:
+  BACKBONE:
+    FREEZE_AT: 0
+    NAME: "build_resnet_backbone"
+  WEIGHTS: "detectron2://ImageNetPretrained/torchvision/R-50.pkl"
+  PIXEL_MEAN: [123.675, 116.280, 103.530]
+  PIXEL_STD: [58.395, 57.120, 57.375]
+  RESNETS:
+    DEPTH: 50
+    STEM_TYPE: "basic"  # not used
+    STEM_OUT_CHANNELS: 64
+    STRIDE_IN_1X1: False
+    OUT_FEATURES: ["res2", "res3", "res4", "res5"]
+    # NORM: "SyncBN"
+    RES5_MULTI_GRID: [1, 1, 1]  # not used
+DATASETS:
+  REF_ROOT: "refer/data/"
+  TRAIN: ("refcoco_unc_train",)
+  TEST: ("refcoco_unc_val",)
+REFERRING:
+  BERT_TYPE: "bert-base-uncased"
+SOLVER:
+  IMS_PER_BATCH: 16
+  BASE_LR: 0.0001
+  STEPS: (327778, 355092)
+  MAX_ITER: 368750
+  WARMUP_FACTOR: 1.0
+  WARMUP_ITERS: 10
+  WEIGHT_DECAY: 0.05
+  OPTIMIZER: "ADAMW"
+  BACKBONE_MULTIPLIER: 0.1
+  CLIP_GRADIENTS:
+    ENABLED: True
+    CLIP_TYPE: "full_model"
+    CLIP_VALUE: 0.01
+    NORM_TYPE: 2.0
+  AMP:
+    ENABLED: True
+INPUT:
+  IMAGE_SIZE: 1024
+  MIN_SCALE: 0.75
+  MAX_SCALE: 1.0
+  FORMAT: "RGB"
+  DATASET_MAPPER_NAME: "refcoco"
+TEST:
+  EVAL_PERIOD: 5000
+DATALOADER:
+  FILTER_EMPTY_ANNOTATIONS: False
+  NUM_WORKERS: 4
+VERSION: 2
@@ -0,0 +1,57 @@
+_BASE_: Base-COCO-InstanceSegmentation.yaml
+MODEL:
+  META_ARCHITECTURE: "GRES"
+  SEM_SEG_HEAD:
+    NAME: "ReferringHead"
+    IGNORE_VALUE: 255
+    NUM_CLASSES: 80
+    LOSS_WEIGHT: 1.0
+    CONVS_DIM: 256
+    MASK_DIM: 256
+    NORM: "GN"
+    # pixel decoder
+    PIXEL_DECODER_NAME: "MSDeformAttnPixelDecoder"
+    IN_FEATURES: ["res2", "res3", "res4", "res5"]
+    DEFORMABLE_TRANSFORMER_ENCODER_IN_FEATURES: ["res3", "res4", "res5"]
+    COMMON_STRIDE: 4
+    TRANSFORMER_ENC_LAYERS: 6
+  MASK_FORMER:
+    TRANSFORMER_DECODER_NAME: "MultiScaleMaskedReferringDecoder"
+    TRANSFORMER_IN_FEATURE: "multi_scale_pixel_decoder"
+    DEEP_SUPERVISION: True
+    LANG_ATT_WEIGHT: 0.1
+    NO_OBJECT_WEIGHT: 0.1
+    CLASS_WEIGHT: 2.0
+    MASK_WEIGHT: 5.0
+    DICE_WEIGHT: 5.0
+    HIDDEN_DIM: 256
+    NUM_OBJECT_QUERIES: 100
+    NHEADS: 8
+    DROPOUT: 0.0
+    DIM_FEEDFORWARD: 2048
+    ENC_LAYERS: 0
+    PRE_NORM: False
+    ENFORCE_INPUT_PROJ: False
+    SIZE_DIVISIBILITY: 32
+    DEC_LAYERS: 10  # 9 decoder layers, add one for the loss on learnable query
+    TRAIN_NUM_POINTS: 12544
+    OVERSAMPLE_RATIO: 3.0
+    IMPORTANCE_SAMPLE_RATIO: 0.75
+    TEST:
+      SEMANTIC_ON: False
+      INSTANCE_ON: True
+      PANOPTIC_ON: False
+      OVERLAP_THRESHOLD: 0.8
+      OBJECT_MASK_THRESHOLD: 0.8
+INPUT:
+  IMAGE_SIZE: 480
+
+DATASETS:
+  TRAIN: ("grefcoco_unc_train",)
+  TEST: ("grefcoco_unc_val",)
+
+SOLVER:
+  IMS_PER_BATCH: 48
+  BASE_LR: 0.00001
+  STEPS: (110000, 210000)
+  MAX_ITER: 300000
@@ -0,0 +1,16 @@
+_BASE_: referring_R50.yaml
+MODEL:
+  BACKBONE:
+    NAME: "D2SwinTransformer"
+  SWIN:
+    EMBED_DIM: 128
+    DEPTHS: [2, 2, 18, 2]
+    NUM_HEADS: [4, 8, 16, 32]
+    WINDOW_SIZE: 12
+    APE: False
+    DROP_PATH_RATE: 0.3
+    PATCH_NORM: True
+    PRETRAIN_IMG_SIZE: 384
+  WEIGHTS: "models/swin_base_patch4_window12_384_22k.pkl"
+  PIXEL_MEAN: [123.675, 116.280, 103.530]
+  PIXEL_STD: [58.395, 57.120, 57.375]
@@ -0,0 +1,15 @@
+_BASE_: referring_R50.yaml
+MODEL:
+  BACKBONE:
+    NAME: "D2SwinTransformer"
+  SWIN:
+    EMBED_DIM: 96
+    DEPTHS: [2, 2, 6, 2]
+    NUM_HEADS: [3, 6, 12, 24]
+    WINDOW_SIZE: 7
+    APE: False
+    DROP_PATH_RATE: 0.3
+    PATCH_NORM: True
+  WEIGHTS: "swin_tiny_patch4_window7_224.pkl"
+  PIXEL_MEAN: [123.675, 116.280, 103.530]
+  PIXEL_STD: [58.395, 57.120, 57.375]
@@ -0,0 +1,16 @@
+## Dataset
+
+The dataset folder should be like this:
+
+```
+datasets
+├── grefcoco
+│   ├── grefs(unc).json
+│   ├── instances.json
+└── images
+    └── train2014
+        ├── COCO_train2014_xxxxxxxxxxxx.jpg
+        ├── COCO_train2014_xxxxxxxxxxxx.jpg
+        └── ...
+```
+Download our gRefCOCO from here!