You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
-**'intrinsic'** (np.ndarray): Camera intrinsic parameters for RGB images.
171
-
-**'depth_intrinsic'** (np.ndarray): Camera intrinsic parameters for depth images.
172
-
-**'extrinsic'** (np.ndarray): Camera extrinsic parameters.
176
+
-**'img_path'** (str): File path to the RGB image.
177
+
-**'depth_img_path'** (str): File path to the depth image.
178
+
-**'intrinsic'** (np.ndarray): Intrinsic parameters of the camera for RGB images.
179
+
-**'depth_intrinsic'** (np.ndarray): Intrinsic parameters of the camera for depth images.
180
+
-**'extrinsic'** (np.ndarray): Extrinsic parameters of the camera.
173
181
-**'visible_instance_id'** (list): IDs of visible objects in the image.
174
182
175
183
### MMScan Evaluator
@@ -182,7 +190,9 @@ For the visual grounding task, our evaluator computes multiple metrics including
182
190
183
191
-**AP and AR**: These metrics calculate the precision and recall by considering each sample as an individual category.
184
192
-**AP_C and AR_C**: These versions categorize samples belonging to the same subclass and calculate them together.
185
-
-**gtop-k**: An expanded metric that generalizes the traditional top-k metric, offering insights into broader performance aspects.
193
+
-**gTop-k**: An expanded metric that generalizes the traditional Top-k metric, offering superior flexibility and interpretability compared to traditional ones when oriented towards multi-target grounding.
194
+
195
+
*Note:* Here, AP corresponds to AP<sub>sample</sub> in the paper, and AP_C corresponds to AP<sub>box</sub> in the paper.
186
196
187
197
Below is an example of how to utilize the Visual Grounding Evaluator:
188
198
@@ -301,11 +311,36 @@ The input structure remains the same as for the question answering evaluator:
301
311
]
302
312
```
303
313
304
-
### Models
314
+
## 🏆 MMScan Benchmark
315
+
316
+
317
+
### MMScan Visual Grounding Benchmark
305
318
306
-
We have adapted the MMScan APIfor some [models](./models/README.md).
Copy file name to clipboardExpand all lines: models/README.md
+35-16Lines changed: 35 additions & 16 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -2,59 +2,68 @@
2
2
3
3
These are 3D visual grounding models adapted for the mmscan-devkit. Currently, two models have been released: EmbodiedScan and ScanRefer.
4
4
5
-
### Scanrefer
5
+
### ScanRefer
6
6
7
-
1. Follow the [Scanrefer](https://github.com/daveredrum/ScanRefer/blob/master/README.md) to setup the Env. For data preparation, you need not load the datasets, only need to download the [preprocessed GLoVE embeddings](https://kaldir.vc.in.tum.de/glove.p) (~990MB) and put them under `data/`
7
+
1. Follow the [ScanRefer](https://github.com/daveredrum/ScanRefer/blob/master/README.md) to setup the environment. For data preparation, you need not load the datasets, only need to download the [preprocessed GLoVE embeddings](https://kaldir.vc.in.tum.de/glove.p) (~990MB) and put them under `data/`
8
8
9
9
2. Install MMScan API.
10
10
11
11
3. Overwrite the `lib/config.py/CONF.PATH.OUTPUT` to your desired output directory.
12
12
13
-
4. Run the following command to train Scanrefer (one GPU):
13
+
4. Run the following command to train ScanRefer (one GPU):
1. Follow the [EmbodiedScan](https://github.com/OpenRobotLab/EmbodiedScan/blob/main/README.md) to setup the Env. Download the [Multi-View 3D Detection model's weights](https://download.openmmlab.com/mim-example/embodiedscan/mv-3ddet.pth) and change the "load_from" path in the config file under `configs/grounding` to the path where the weights are saved.
31
+
1. Follow the [EmbodiedScan](https://github.com/OpenRobotLab/EmbodiedScan/blob/main/README.md) to setup the environment. Download the [Multi-View 3D Detection model's weights](https://download.openmmlab.com/mim-example/embodiedscan/mv-3ddet.pth) and change the "load_from" path in the config file under `configs/grounding` to the path where the weights are saved.
28
32
29
33
2. Install MMScan API.
30
34
31
-
3. Run the following command to train EmbodiedScan (multiple GPU):
35
+
3. Run the following command to train EmbodiedScan (multiple GPUs):
These are 3D question answering models adapted for the mmscan-devkit. Currently, two models have been released: LL3DA and LEO.
54
63
55
64
### LL3DA
56
65
57
-
1. Follow the [LL3DA](https://github.com/Open3DA/LL3DA/blob/main/README.md) to setup the Env. For data preparation, you need not load the datasets, only need to:
66
+
1. Follow the [LL3DA](https://github.com/Open3DA/LL3DA/blob/main/README.md) to setup the environment. For data preparation, you need not load the datasets, only need to:
58
67
59
68
(1) download the [release pre-trained weights.](https://huggingface.co/CH3COOK/LL3DA-weight-release/blob/main/ll3da-opt-1.3b.pth) and put them under `./pretrained`
60
69
@@ -64,13 +73,13 @@ These are 3D question answering models adapted for the mmscan-devkit. Currently,
64
73
65
74
3. Edit the config under `./scripts/opt-1.3b/eval.mmscanqa.sh` and `./scripts/opt-1.3b/tuning.mmscanqa.sh`
66
75
67
-
4. Run the following command to train LL3DA (4 GPU):
76
+
4. Run the following command to train LL3DA (4 GPUs):
68
77
69
78
```bash
70
79
bash scripts/opt-1.3b/tuning.mmscanqa.sh
71
80
```
72
81
73
-
5. Run the following command to evaluate LL3DA (4 GPU):
82
+
5. Run the following command to evaluate LL3DA (4 GPUs):
74
83
75
84
```bash
76
85
bash scripts/opt-1.3b/eval.mmscanqa.sh
@@ -84,10 +93,17 @@ These are 3D question answering models adapted for the mmscan-devkit. Currently,
1. Follow the [LEO](https://github.com/embodied-generalist/embodied-generalist/blob/main/README.md) to setup the Env. For data preparation, you need not load the datasets, only need to:
106
+
1. Follow the [LEO](https://github.com/embodied-generalist/embodied-generalist/blob/main/README.md) to setup the environment. For data preparation, you need not load the datasets, only need to:
91
107
92
108
(1) Download [Vicuna-7B](https://huggingface.co/huangjy-pku/vicuna-7b/tree/main) and update cfg_path in configs/llm/\*.yaml
93
109
@@ -97,13 +113,13 @@ These are 3D question answering models adapted for the mmscan-devkit. Currently,
97
113
98
114
3. Edit the config under `scripts/train_tuning_mmscan.sh` and `scripts/test_tuning_mmscan.sh`
99
115
100
-
4. Run the following command to train LEO (4 GPU):
116
+
4. Run the following command to train LEO (4 GPUs):
101
117
102
118
```bash
103
119
bash scripts/train_tuning_mmscan.sh
104
120
```
105
121
106
-
5. Run the following command to evaluate LEO (4 GPU):
122
+
5. Run the following command to evaluate LEO (4 GPUs):
107
123
108
124
```bash
109
125
bash scripts/test_tuning_mmscan.sh
@@ -117,5 +133,8 @@ These are 3D question answering models adapted for the mmscan-devkit. Currently,
PS : It is possible that LEO may encounter an "NaN" error in the MultiHeadAttentionSpatial module due to the training setup when training more epoches. ( no problem for 4GPU one epoch)
0 commit comments