TubeR: Tubelet Transformer for Video Action Detection

This repo contains the supported code to reproduce spatio-temporal action detection results of TubeR: Tubelet Transformer for Video Action Detection.

Updates

08/08/2022 Initial commits

Results and Models

AVA 2.1 Dataset

Backbone	Pretrain	#view	mAP	FLOPs	config	model
CSN-50	Kinetics-400	1 view	27.2	78G	config	S3
CSN-50 (with long-term context)	Kinetics-400	1 view	28.8	78G	config	Comming soon
CSN-152	Kinetics-400+IG65M	1 view	29.7	120G	config	S3
CSN-152 (with long-term context)	Kinetics-400+IG65M	1 view	31.7	120G	config	Comming soon

AVA 2.2 Dataset

Backbone	Pretrain	#view	mAP	FLOPs	config	model
CSN-152	Kinetics-400+IG65M	1 view	31.1	120G	config	S3
CSN-152 (with long-term context)	Kinetics-400+IG65M	1 view	33.4	120G	config	Comming soon

JHMDB Dataset

Backbone	#view	[email protected]	[email protected]	config	model
CSN-152	1 view	87.4	82.3	config	S3

Usage

The project is developed based on GluonCV-torch. Please refer to tutorial for details.

Dependency

The project is tested working on:

Torch 1.12 + CUDA 11.3
timm==0.4.5
tensorboardX

Dataset

Please download the asset.zip and unzip them at ./datasets.

[AVA] Please refer to DATASET.md for AVA dataset downloading and pre-processing. [JHMDB] Please refer to JHMDB for JHMDB dataset and Dataset Section for UCF dataset. You also can refer to ACT-Detector to prepare the two datasets.

Inference

To run inference, first modify the config file:

set the correct WORLD_SIZE, GPU_WORLD_SIZE, DIST_URL, WOLRD_URLS based on experiment setup.
set the LABEL_PATH, ANNO_PATH, DATA_PATH to your local directory accordingly.
Download the pre-trained model and set PRETRAINED_PATH to model path.
make sure LOAD and LOAD_FC are set to True

Then run:

# run testing
python3  eval_tuber_ava.py <CONFIG_FILE> 

# for example, to evaluate ava from scratch, run:
python3 eval_tuber_ava.py configuration/TubeR_CSN152_AVA21.yaml

Training

To train TubeR from scratch, first modify the configfile:

set the correct WORLD_SIZE, GPU_WORLD_SIZE, DIST_URL, WOLRD_URLS based on experiment setup.
set the LABEL_PATH, ANNO_PATH, DATA_PATH to your local directory accordingly.
Download the pre-trained feature backbone and transformer weights and set PRETRAIN_BACKBONE_DIR (CSN50, CSN152), PRETRAIN_TRANSFORMER_DIR (DETR) accordingly.
make sure LOAD and LOAD_FC are set to False

Then run:

# run training from scratch
python3  train_tuber.py <CONFIG_FILE>

# for example, to train ava from scratch, run:
python3 train_tuber_ava.py configuration/TubeR_CSN152_AVA21.yaml

TODO

[ ]Add tutorial and pre-trained weights for TubeR with long-term memory

[ ]Add weights for UCF24

Citing TubeR

@inproceedings{zhao2022tuber,
  title={TubeR: Tubelet transformer for video action detection},
  author={Zhao, Jiaojiao and Zhang, Yanyi and Li, Xinyu and Chen, Hao and Shuai, Bing and Xu, Mingze and Liu, Chunhui and Kundu, Kaustav and Xiong, Yuanjun and Modolo, Davide and others},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={13598--13607},
  year={2022}
}

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
configuration		configuration
datasets		datasets
evaluates		evaluates
models		models
pipelines		pipelines
utils		utils
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
NOTICE		NOTICE
README.md		README.md
eval_tuber_ava.py		eval_tuber_ava.py
eval_tuber_jhmdb.py		eval_tuber_jhmdb.py
train_tuber_ava.py		train_tuber_ava.py
train_tuber_jhmdb.py		train_tuber_jhmdb.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

TubeR: Tubelet Transformer for Video Action Detection

Updates

Results and Models

AVA 2.1 Dataset

AVA 2.2 Dataset

JHMDB Dataset

Usage

Dependency

Dataset

Inference

Training

TODO

Citing TubeR

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 4

Uh oh!

Languages

License

amazon-science/tubelet-transformer

Folders and files

Latest commit

History

Repository files navigation

TubeR: Tubelet Transformer for Video Action Detection

Updates

Results and Models

AVA 2.1 Dataset

AVA 2.2 Dataset

JHMDB Dataset

Usage

Dependency

Dataset

Inference

Training

TODO

Citing TubeR

About

Topics

Resources

License

Code of conduct

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 4

Uh oh!

Languages

Packages