Skip to content

Commit ee2e542

Browse files
authored
Add GroundingDINO on ODinW results, and support caption prompt of GroundingDINO (#11187)
1 parent 24bb129 commit ee2e542

File tree

6 files changed

+2409
-80
lines changed

6 files changed

+2409
-80
lines changed

configs/odinw/README.md

Lines changed: 59 additions & 57 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66

77
## Get Started
88

9-
1. development Developmennt Setup can reger to hits /\\To download dataset, you can refer to [reference document](../../docs/zh_cn/user_guides/dataset_prepare.md)
9+
1. To download dataset, you can refer to [reference document](../../docs/zh_cn/user_guides/dataset_prepare.md)
1010

1111
2. You can use the following data to run the inference.
1212

@@ -22,73 +22,75 @@ Learning visual representations from natural language supervision has recently s
2222

2323
## Results and models of odinw13
2424

25-
| Method | GLIP-T(A) | Official | GLIP-T(B) | Official | GLIP-T(C) | Official |
26-
| --------------------- | --------- | --------- | --------- | --------- | --------- | --------- |
27-
| AerialMaritimeDrone | 0.123 | 0.122 | 0.110 | 0.11 | 0.130 | 0.130 |
28-
| Aquarium | 0.175 | 0.174 | 0.173 | 0.169 | 0.191 | 0.190 |
29-
| CottontailRabbits | 0.686 | 0.686 | 0.688 | 0.688 | 0.744 | 0.744 |
30-
| EgoHands | 0.013 | 0.013 | 0.003 | 0.540 | 0.314 | 0.315 |
31-
| NorthAmericaMushrooms | 0.502 | 0.502 | 0.367 | 0.051 | 0.297 | 0.296 |
32-
| Packages | 0.589 | 0.589 | 0.083 | 0.030 | 0.699 | 0.699 |
33-
| PascalVOC | 0.512 | 0.512 | 0.541 | 0.288 | 0.565 | 0.565 |
34-
| pistols | 0.339 | 0.339 | 0.502 | 0.338 | 0.503 | 0.504 |
35-
| pothole | 0.007 | 0.007 | 0.030 | 0.475 | 0.058 | 0.058 |
36-
| Raccoon | 0.075 | 0.075 | 0.285 | 0.288 | 0.241 | 0.244 |
37-
| ShellfishOpenImages | 0.372 | 0.372 | 0.337 | 0.338 | 0.300 | 0.302 |
38-
| thermalDogsAndPeople | 0.372 | 0.372 | 0.475 | 0.475 | 0.510 | 0.510 |
39-
| VehiclesOpenImages | 0.574 | 0.574 | 0.562 | 0.547 | 0.549 | 0.534 |
40-
| Average | **0.334** | **0.324** | **0.320** | **0.318** | **0.392** | **0.392** |
25+
| Method | GLIP-T(A) | Official | GLIP-T(B) | Official | GLIP-T(C) | Official | GroundingDINO-T | GroundingDINO-B |
26+
| --------------------- | --------- | --------- | --------- | --------- | --------- | --------- | --------------- | --------------- |
27+
| AerialMaritimeDrone | 0.123 | 0.122 | 0.110 | 0.110 | 0.130 | 0.130 | 0.173 | 0.281 |
28+
| Aquarium | 0.175 | 0.174 | 0.173 | 0.169 | 0.191 | 0.190 | 0.195 | 0.445 |
29+
| CottontailRabbits | 0.686 | 0.686 | 0.688 | 0.688 | 0.744 | 0.744 | 0.799 | 0.808 |
30+
| EgoHands | 0.013 | 0.013 | 0.003 | 0.004 | 0.314 | 0.315 | 0.608 | 0.764 |
31+
| NorthAmericaMushrooms | 0.502 | 0.502 | 0.367 | 0.367 | 0.297 | 0.296 | 0.507 | 0.675 |
32+
| Packages | 0.589 | 0.589 | 0.083 | 0.083 | 0.699 | 0.699 | 0.687 | 0.670 |
33+
| PascalVOC | 0.512 | 0.512 | 0.541 | 0.540 | 0.565 | 0.565 | 0.563 | 0.711 |
34+
| pistols | 0.339 | 0.339 | 0.502 | 0.501 | 0.503 | 0.504 | 0.726 | 0.771 |
35+
| pothole | 0.007 | 0.007 | 0.030 | 0.030 | 0.058 | 0.058 | 0.215 | 0.478 |
36+
| Raccoon | 0.075 | 0.074 | 0.285 | 0.288 | 0.241 | 0.244 | 0.549 | 0.541 |
37+
| ShellfishOpenImages | 0.253 | 0.253 | 0.337 | 0.338 | 0.300 | 0.302 | 0.393 | 0.650 |
38+
| thermalDogsAndPeople | 0.372 | 0.372 | 0.475 | 0.475 | 0.510 | 0.510 | 0.657 | 0.633 |
39+
| VehiclesOpenImages | 0.574 | 0.566 | 0.562 | 0.547 | 0.549 | 0.534 | 0.613 | 0.647 |
40+
| Average | **0.325** | **0.324** | **0.320** | **0.318** | **0.392** | **0.392** | **0.514** | **0.621** |
4141

4242
Note:
4343

4444
1. The above are zero-shot evaluation results.
45-
2. The config and weights can be found at [here](../glip/README.md)
45+
2. The config and weights of GLIPs models can be found at [here](../glip/README.md)
46+
3. The config and weights of GroundingDINO models can be found at [here](../grounding_dino/README.md)
4647

4748
## Results and models of odinw35
4849

49-
| Method | GLIP-T(A) | Official | GLIP-T(B) | Official | GLIP-T(C) | Official |
50-
| --------------------------- | --------- | --------- | --------- | --------- | --------- | --------- |
51-
| AerialMaritimeDrone_large | 0.123 | 0.122 | 0.110 | 0.110 | 0.130 | 0.130 |
52-
| AerialMaritimeDrone_tiled | 0.174 | 0.174 | 0.172 | 0.172 | 0.172 | 0.172 |
53-
| AmericanSignLanguageLetters | 0.001 | 0.001 | 0.003 | 0.003 | 0.009 | 0.009 |
54-
| Aquarium | 0.175 | 0.175 | 0.173 | 0.171 | 0.192 | 0.182 |
55-
| BCCD | 0.016 | 0.016 | 0.001 | 0.001 | 0.000 | 0.000 |
56-
| boggleBoards | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 |
57-
| brackishUnderwater | 0.016 | 0..013 | 0.021 | 0.027 | 0.020 | 0.022 |
58-
| ChessPieces | 0.001 | 0.001 | 0.000 | 0.000 | 0.001 | 0.001 |
59-
| CottontailRabbits | 0.710 | 0.709 | 0.683 | 0.683 | 0.752 | 0.752 |
60-
| dice | 0.005 | 0.005 | 0.004 | 0.004 | 0.004 | 0.004 |
61-
| DroneControl | 0.016 | 0.017 | 0.006 | 0.008 | 0.005 | 0.007 |
62-
| EgoHands_generic | 0.009 | 0.010 | 0.005 | 0.006 | 0.510 | 0.508 |
63-
| EgoHands_specific | 0.001 | 0.001 | 0.004 | 0.006 | 0.003 | 0.004 |
64-
| HardHatWorkers | 0.029 | 0.029 | 0.023 | 0.023 | 0.033 | 0.033 |
65-
| MaskWearing | 0.007 | 0.007 | 0.003 | 0.002 | 0.005 | 0.005 |
66-
| MountainDewCommercial | 0.218 | 0.227 | 0.199 | 0.197 | 0.478 | 0.463 |
67-
| NorthAmericaMushrooms | 0.502 | 0.502 | 0.450 | 0.450 | 0.497 | 0.497 |
68-
| openPoetryVision | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 |
69-
| OxfordPets_by_breed | 0.001 | 0.002 | 0.002 | 0.004 | 0.001 | 0.002 |
70-
| OxfordPets_by_species | 0.016 | 0.011 | 0.012 | 0.009 | 0.013 | 0.009 |
71-
| PKLot | 0.002 | 0.002 | 0.000 | 0.000 | 0.000 | 0.000 |
72-
| Packages | 0.569 | 0.569 | 0.279 | 0.279 | 0.712 | 0.712 |
73-
| PascalVOC | 0.512 | 0.512 | 0.541 | 0.540 | 0.565 | 0.565 |
74-
| pistols | 0.339 | 0.339 | 0.502 | 0.501 | 0.503 | 0.504 |
75-
| plantdoc | 0.002 | 0.002 | 0.007 | 0.007 | 0.009 | 0.009 |
76-
| pothole | 0.007 | 0.010 | 0.024 | 0.025 | 0.085 | 0.101 |
77-
| Raccoons | 0.075 | 0.074 | 0.285 | 0.288 | 0.241 | 0.244 |
78-
| selfdrivingCar | 0.071 | 0.072 | 0.074 | 0.074 | 0.081 | 0.080 |
79-
| ShellfishOpenImages | 0.253 | 0.253 | 0.337 | 0.338 | 0.300 | 0.302 |
80-
| ThermalCheetah | 0.028 | 0.028 | 0.000 | 0.000 | 0.028 | 0.028 |
81-
| thermalDogsAndPeople | 0.372 | 0.372 | 0.475 | 0.475 | 0.510 | 0.510 |
82-
| UnoCards | 0.000 | 0.000 | 0.000 | 0.001 | 0.002 | 0.003 |
83-
| VehiclesOpenImages | 0.574 | 0.566 | 0.562 | 0.547 | 0.549 | 0.534 |
84-
| WildfireSmoke | 0.000 | 0.000 | 0.000 | 0.000 | 0.017 | 0.017 |
85-
| websiteScreenshots | 0.003 | 0.004 | 0.003 | 0.005 | 0.005 | 0.006 |
86-
| Average | **0.134** | **0.134** | **0.138** | **0.138** | **0.179** | **0.178** |
50+
| Method | GLIP-T(A) | Official | GLIP-T(B) | Official | GLIP-T(C) | Official | GroundingDINO-T | GroundingDINO-B |
51+
| --------------------------- | --------- | --------- | --------- | --------- | --------- | --------- | --------------- | --------------- |
52+
| AerialMaritimeDrone_large | 0.123 | 0.122 | 0.110 | 0.110 | 0.130 | 0.130 | 0.173 | 0.281 |
53+
| AerialMaritimeDrone_tiled | 0.174 | 0.174 | 0.172 | 0.172 | 0.172 | 0.172 | 0.206 | 0.364 |
54+
| AmericanSignLanguageLetters | 0.001 | 0.001 | 0.003 | 0.003 | 0.009 | 0.009 | 0.002 | 0.096 |
55+
| Aquarium | 0.175 | 0.175 | 0.173 | 0.171 | 0.192 | 0.182 | 0.195 | 0.445 |
56+
| BCCD | 0.016 | 0.016 | 0.001 | 0.001 | 0.000 | 0.000 | 0.161 | 0.584 |
57+
| boggleBoards | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.134 |
58+
| brackishUnderwater | 0.016 | 0..013 | 0.021 | 0.027 | 0.020 | 0.022 | 0.021 | 0.454 |
59+
| ChessPieces | 0.001 | 0.001 | 0.000 | 0.000 | 0.001 | 0.001 | 0.000 | 0.000 |
60+
| CottontailRabbits | 0.710 | 0.709 | 0.683 | 0.683 | 0.752 | 0.752 | 0.806 | 0.797 |
61+
| dice | 0.005 | 0.005 | 0.004 | 0.004 | 0.004 | 0.004 | 0.004 | 0.082 |
62+
| DroneControl | 0.016 | 0.017 | 0.006 | 0.008 | 0.005 | 0.007 | 0.042 | 0.638 |
63+
| EgoHands_generic | 0.009 | 0.010 | 0.005 | 0.006 | 0.510 | 0.508 | 0.608 | 0.764 |
64+
| EgoHands_specific | 0.001 | 0.001 | 0.004 | 0.006 | 0.003 | 0.004 | 0.002 | 0.687 |
65+
| HardHatWorkers | 0.029 | 0.029 | 0.023 | 0.023 | 0.033 | 0.033 | 0.046 | 0.439 |
66+
| MaskWearing | 0.007 | 0.007 | 0.003 | 0.002 | 0.005 | 0.005 | 0.004 | 0.406 |
67+
| MountainDewCommercial | 0.218 | 0.227 | 0.199 | 0.197 | 0.478 | 0.463 | 0.430 | 0.580 |
68+
| NorthAmericaMushrooms | 0.502 | 0.502 | 0.450 | 0.450 | 0.497 | 0.497 | 0.471 | 0.501 |
69+
| openPoetryVision | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.051 |
70+
| OxfordPets_by_breed | 0.001 | 0.002 | 0.002 | 0.004 | 0.001 | 0.002 | 0.003 | 0.799 |
71+
| OxfordPets_by_species | 0.016 | 0.011 | 0.012 | 0.009 | 0.013 | 0.009 | 0.011 | 0.872 |
72+
| PKLot | 0.002 | 0.002 | 0.000 | 0.000 | 0.000 | 0.000 | 0.001 | 0.774 |
73+
| Packages | 0.569 | 0.569 | 0.279 | 0.279 | 0.712 | 0.712 | 0.695 | 0.728 |
74+
| PascalVOC | 0.512 | 0.512 | 0.541 | 0.540 | 0.565 | 0.565 | 0.563 | 0.711 |
75+
| pistols | 0.339 | 0.339 | 0.502 | 0.501 | 0.503 | 0.504 | 0.726 | 0.771 |
76+
| plantdoc | 0.002 | 0.002 | 0.007 | 0.007 | 0.009 | 0.009 | 0.005 | 0.376 |
77+
| pothole | 0.007 | 0.010 | 0.024 | 0.025 | 0.085 | 0.101 | 0.215 | 0.478 |
78+
| Raccoons | 0.075 | 0.074 | 0.285 | 0.288 | 0.241 | 0.244 | 0.549 | 0.541 |
79+
| selfdrivingCar | 0.071 | 0.072 | 0.074 | 0.074 | 0.081 | 0.080 | 0.089 | 0.318 |
80+
| ShellfishOpenImages | 0.253 | 0.253 | 0.337 | 0.338 | 0.300 | 0.302 | 0.393 | 0.650 |
81+
| ThermalCheetah | 0.028 | 0.028 | 0.000 | 0.000 | 0.028 | 0.028 | 0.087 | 0.290 |
82+
| thermalDogsAndPeople | 0.372 | 0.372 | 0.475 | 0.475 | 0.510 | 0.510 | 0.657 | 0.633 |
83+
| UnoCards | 0.000 | 0.000 | 0.000 | 0.001 | 0.002 | 0.003 | 0.006 | 0.754 |
84+
| VehiclesOpenImages | 0.574 | 0.566 | 0.562 | 0.547 | 0.549 | 0.534 | 0.613 | 0.647 |
85+
| WildfireSmoke | 0.000 | 0.000 | 0.000 | 0.000 | 0.017 | 0.017 | 0.134 | 0.410 |
86+
| websiteScreenshots | 0.003 | 0.004 | 0.003 | 0.005 | 0.005 | 0.006 | 0.012 | 0.175 |
87+
| Average | **0.134** | **0.134** | **0.138** | **0.138** | **0.179** | **0.178** | **0.227** | **0.492** |
8788

8889
Note:
8990

9091
1. The above are zero-shot evaluation results.
91-
2. The config and weights can be found at [here](../glip/README.md)
92+
2. The config and weights of GLIPs models can be found at [here](../glip/README.md)
93+
3. The config and weights of GroundingDINO models can be found at [here](../grounding_dino/README.md)
9294

9395
## Citation
9496

0 commit comments

Comments
 (0)