Cityscapes: https://www.cityscapes-dataset.com/
Our code expects the Cityscapes dataset directory to follow the following structure:
cityscapes
├── gtFine
| ├── train
| ├── val
├── leftImg8bit
| ├── train
| ├── val
ADE20K: https://groups.csail.mit.edu/vision/datasets/ADE20K/
Our code expects the ADE20K dataset directory to follow the following structure:
ade20k
├── annotations
| ├── training
| ├── validation
├── images
| ├── training
| ├── validation
Latency/Throughput is measured on NVIDIA Jetson Nano, NVIDIA Jetson AGX Orin, and NVIDIA A100 GPU with TensorRT, fp16. Data transfer time is included. Please put the downloaded checkpoints under ${efficientvit_repo}/assets/checkpoints/efficientvit_seg/
Model | Resolution | Cityscapes mIoU | Params | MACs | Jetson Orin Latency (bs1) | A100 Throughput (bs1) | Checkpoint |
---|---|---|---|---|---|---|---|
EfficientViT-L1 | 1024x2048 | 82.716 | 40M | 282G | 45.9ms | 122 image/s | link |
EfficientViT-L2 | 1024x2048 | 83.228 | 53M | 396G | 60.0ms | 102 image/s | link |
EfficientViT B series
Model | Resolution | Cityscapes mIoU | Params | MACs | Jetson Nano (bs1) | Jetson Orin (bs1) | Checkpoint |
---|---|---|---|---|---|---|---|
EfficientViT-B0 | 1024x2048 | 75.653 | 0.7M | 4.4G | 275ms | 9.9ms | link |
EfficientViT-B1 | 1024x2048 | 80.547 | 4.8M | 25G | 819ms | 24.3ms | link |
EfficientViT-B2 | 1024x2048 | 82.073 | 15M | 74G | 1676ms | 46.5ms | link |
EfficientViT-B3 | 1024x2048 | 83.016 | 40M | 179G | 3192ms | 81.8ms | link |
Model | Resolution | ADE20K mIoU | Params | MACs | Jetson Orin Latency (bs1) | A100 Throughput (bs16) | Checkpoint |
---|---|---|---|---|---|---|---|
EfficientViT-L1 | 512x512 | 49.191 | 40M | 36G | 7.2ms | 947 image/s | link |
EfficientViT-L2 | 512x512 | 50.702 | 51M | 45G | 9.0ms | 758 image/s | link |
EfficientViT B series
Model | Resolution | ADE20K mIoU | Params | MACs | Jetson Nano (bs1) | Jetson Orin (bs1) | Checkpoint |
---|---|---|---|---|---|---|---|
EfficientViT-B1 | 512x512 | 42.840 | 4.8M | 3.1G | 110ms | 4.0ms | link |
EfficientViT-B2 | 512x512 | 45.941 | 15M | 9.1G | 212ms | 7.3ms | link |
EfficientViT-B3 | 512x512 | 49.013 | 39M | 22G | 411ms | 12.5ms | link |
# semantic segmentation
from efficientvit.seg_model_zoo import create_efficientvit_seg_model
model = create_efficientvit_seg_model(name="efficientvit-seg-l2-cityscapes", pretrained=True)
model = create_efficientvit_seg_model(name="efficientvit-seg-l2-ade20k", pretrained=True)
Please run eval_efficientvit_seg_model.py to evaluate our models.
Examples: segmentation
Please run demo_efficientvit_seg_model.py to visualize the models.
Example:
python applications/efficientvit_seg/demo_efficientvit_seg_model.py --image_path assets/fig/indoor.jpg --dataset ade20k --crop_size 512 --model efficientvit-seg-l2-ade20k
python applications/efficientvit_seg/demo_efficientvit_seg_model.py --image_path assets/fig/city.png --dataset cityscapes --crop_size 1024 --model efficientvit-seg-l2-cityscapes
To generate ONNX files, please refer to onnx_export.py.
Example:
python assets/onnx_export.py --export_path assets/export_models/efficientvit_seg_l2_cityscapes_r1024x2048.onnx --task seg --model efficientvit-seg-l2-cityscapes --resolution 1024 2048 --bs 1
To generate TFLite files, please refer to tflite_export.py.
Example:
python assets/tflite_export.py --export_path assets/export_models/efficientvit_seg_l2_ade20k_r512x512.onnx --task seg --model efficientvit-seg-l2-ade20k --resolution 512 512
If EfficientViT is useful or relevant to your research, please kindly recognize our contributions by citing our paper:
@inproceedings{cai2023efficientvit,
title={Efficientvit: Lightweight multi-scale attention for high-resolution dense prediction},
author={Cai, Han and Li, Junyan and Hu, Muyan and Gan, Chuang and Han, Song},
booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
pages={17302--17313},
year={2023}
}