Skip to content

Latest commit

 

History

History
153 lines (105 loc) · 6.38 KB

File metadata and controls

153 lines (105 loc) · 6.38 KB

EfficientViT Segmentation

demo

Datasets

Cityscapes: https://www.cityscapes-dataset.com/
Our code expects the Cityscapes dataset directory to follow the following structure:

cityscapes
├── gtFine
|   ├── train
|   ├── val
├── leftImg8bit
|   ├── train
|   ├── val
ADE20K: https://groups.csail.mit.edu/vision/datasets/ADE20K/
Our code expects the ADE20K dataset directory to follow the following structure:

ade20k
├── annotations
|   ├── training
|   ├── validation
├── images
|   ├── training
|   ├── validation

Pretrained EfficientViT Segmentation Models

Latency/Throughput is measured on NVIDIA Jetson Nano, NVIDIA Jetson AGX Orin, and NVIDIA A100 GPU with TensorRT, fp16. Data transfer time is included. Please put the downloaded checkpoints under ${efficientvit_repo}/assets/checkpoints/efficientvit_seg/

Cityscapes

Model Resolution Cityscapes mIoU Params MACs Jetson Orin Latency (bs1) A100 Throughput (bs1) Checkpoint
EfficientViT-L1 1024x2048 82.716 40M 282G 45.9ms 122 image/s link
EfficientViT-L2 1024x2048 83.228 53M 396G 60.0ms 102 image/s link
EfficientViT B series
Model Resolution Cityscapes mIoU Params MACs Jetson Nano (bs1) Jetson Orin (bs1) Checkpoint
EfficientViT-B0 1024x2048 75.653 0.7M 4.4G 275ms 9.9ms link
EfficientViT-B1 1024x2048 80.547 4.8M 25G 819ms 24.3ms link
EfficientViT-B2 1024x2048 82.073 15M 74G 1676ms 46.5ms link
EfficientViT-B3 1024x2048 83.016 40M 179G 3192ms 81.8ms link

ADE20K

Model Resolution ADE20K mIoU Params MACs Jetson Orin Latency (bs1) A100 Throughput (bs16) Checkpoint
EfficientViT-L1 512x512 49.191 40M 36G 7.2ms 947 image/s link
EfficientViT-L2 512x512 50.702 51M 45G 9.0ms 758 image/s link
EfficientViT B series
Model Resolution ADE20K mIoU Params MACs Jetson Nano (bs1) Jetson Orin (bs1) Checkpoint
EfficientViT-B1 512x512 42.840 4.8M 3.1G 110ms 4.0ms link
EfficientViT-B2 512x512 45.941 15M 9.1G 212ms 7.3ms link
EfficientViT-B3 512x512 49.013 39M 22G 411ms 12.5ms link

Usage

# semantic segmentation
from efficientvit.seg_model_zoo import create_efficientvit_seg_model

model = create_efficientvit_seg_model(name="efficientvit-seg-l2-cityscapes", pretrained=True)

model = create_efficientvit_seg_model(name="efficientvit-seg-l2-ade20k", pretrained=True)

Evaluation

Please run eval_efficientvit_seg_model.py to evaluate our models.

Examples: segmentation

Visualization

Please run demo_efficientvit_seg_model.py to visualize the models.

Example:

python applications/efficientvit_seg/demo_efficientvit_seg_model.py --image_path assets/fig/indoor.jpg --dataset ade20k --crop_size 512 --model efficientvit-seg-l2-ade20k

python applications/efficientvit_seg/demo_efficientvit_seg_model.py --image_path assets/fig/city.png --dataset cityscapes --crop_size 1024 --model efficientvit-seg-l2-cityscapes

Export

Onnx

To generate ONNX files, please refer to onnx_export.py.

Example:

python assets/onnx_export.py --export_path assets/export_models/efficientvit_seg_l2_cityscapes_r1024x2048.onnx --task seg --model efficientvit-seg-l2-cityscapes --resolution 1024 2048 --bs 1

TFLite

To generate TFLite files, please refer to tflite_export.py.

Example:

python assets/tflite_export.py --export_path assets/export_models/efficientvit_seg_l2_ade20k_r512x512.onnx --task seg --model efficientvit-seg-l2-ade20k --resolution 512 512

Reference

If EfficientViT is useful or relevant to your research, please kindly recognize our contributions by citing our paper:

@inproceedings{cai2023efficientvit,
  title={Efficientvit: Lightweight multi-scale attention for high-resolution dense prediction},
  author={Cai, Han and Li, Junyan and Hu, Muyan and Gan, Chuang and Han, Song},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  pages={17302--17313},
  year={2023}
}