What is COCO-UniHuman?

We introduce two versions of COCO-UniHuman datasets.

COCO-UniHuman v1 is the first large-scale dataset which provides annotations for human-centric perception tasks in multi-person scenarios. The annotations include bounding boxes, keypoints, segmetation masks, smpl parameters, human attributes (age and gender). It is an extension of COCO 2017 dataset with the same train/val split as COCO'17.

COCO-UniHuman v2 is an extension of COCO-UniHuman v1, which incorportates COCO-UniHuman v1, COCO-WholeBody[16] and COCO-DensePose[17], encouraing further research on multi-task human-centric perception.

How to Use?

Download

Images can be downloaded from COCO 2017 website.

COCO-UniHuman annotations for train/val can be downloaded from download link (GoogleDrive).

Alternatively, we also provide the BaiduPan download link for the annotation files.

BaiduPan Link: https://pan.baidu.com/s/11PP70mlE03G6xoon6L7U4A

Password: a8wq

Annotation Format

The data format is defined in DATA_FORMAT.

Terms of Use

COCO-UniHuman dataset is ONLY available for research and non-commercial use. The annotations of COCO-UniHuman dataset belong to SenseTime Research, and are licensed under a Creative Commons Attribution 4.0 License.
For commercial usage of our COCO-UniHuman annotations, please contact Mr. Sheng Jin (jinsheng13[at]foxmail[dot]com). We will send the detail agreement to you.
We do not own the copyright of the images. Use of the images must abide by the Flickr Terms of Use. The users of the images accept full responsibility for the use of the dataset, including but not limited to the use of any copies of copyrighted images that they may create from the dataset.

Compare with other popular datasets.

Overview of representative HCP datasets. “Images”, “Instances”, and “IDs” mean the number of total images, instances and identities respectively. “Crop” indicates whether the images are cropped for “face” or “body”. * means head box annotation. “group:n” means age classification with n groups, “real” means real age estimation, and “appa” means apparent age estimation.

DataSet	Images	Instances	IDs	Crop	BodyBox	FaceBox	BodyKpt	BodyMask	Gender	Age	Mesh
Caltech [1]	250K	350K	2.3K	✗	✔️	✗	✗	✗	✗	✗	✗
CityPersons [2]	5K	32K	32K	✗	✔️	✗	✗	✗	✗	✗	✗
CrowdHuman [3]	24K	552K	552K	✗	✔️	*	✗	✗	✗	✗	✗
MPII [4]	25K	40K	-	✗	✔️	*	✔️	✗	✗	✗	✗
PoseTrack [5]	23K	153K	-	✗	✔️	️ *	✔️️	✗	✗	✗	✗
CIHP [6]	38K	129K	129K	✗	✔️	️ ✗	✗	✔️	️ ✗	✗	✗
MHP [7]	5K	15K	15K	✗	✔️	✗	✗	✔️	✗	✗	✗
CelebA [8]	200K	200K	10K	face	✗	✗	✗	✗	✔️	group:4	✗
APPA-REAL [9]	7.5K	7.5K	7.5K	face	✗	✗	✗	✗	✔️	appa & real	✗
MegaAge [10]	40K	40K	40K	face	✗	✗	✗	✗	✔️	real	✗
WIDER-Attr [11]	13K	57K	57K	✗	✔️	✗	✗	✗	✔️	group:6	✗
PETA [12]	19K	19K	8.7K	body	✗	✗	✗	✗	✔️	group:4	✗
PA-100K [13]	100K	100K	-	body	✗	✗	✗	✗	✔️	group:3	✗
OCHuman [14]	5K	13K	13K	✗	✔️	✗	✔️	✔️	✗	✗	✗
COCO [15]	200K	273K	273K	✗	✔️	✗	✔️	✔️	✗	✗	✗
COCO-WholeBody [16]	200K	273K	273K	✗	✔️	✔️	✔️	✗	✗	✗	✗
COCO-UniHuman v1	200K(64k)	273K	273K	✗	✔️	✔️	✔️	✔️	✔️	appa	✔️

Citation

If you use this dataset in your project, please cite these papers.

@inproceedings{jin2024you,
  title={You Only Learn One Query: Learning Unified Human Query for Single-Stage Multi-Person Multi-Task Human-Centric Perception},
  author={Jin, Sheng and Li, Shuhuai and Li, Tong and Liu, Wentao and Qian, Chen and Luo, Ping},
  booktitle={Eur. Conf. Comput. Vis. (ECCV)},
  year={2024}
}

@inproceedings{lin2014microsoft,
  title={Microsoft coco: Common objects in context},
  author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
  booktitle={Eur. Conf. Comput. Vis. (ECCV)},
  pages={740--755},
  year={2014}
}

@inproceedings{joo2021exemplar,
  title={Exemplar fine-tuning for 3d human model fitting towards in-the-wild 3d human pose estimation},
  author={Joo, Hanbyul and Neverova, Natalia and Vedaldi, Andrea},
  booktitle={Int. Conf. 3D Vis. (3DV)},
  pages={42--52},
  year={2021}
}

If you use the v2 version dataset in your project, please also cite these additional papers.

@inproceedings{jin2020whole,
  title={Whole-body human pose estimation in the wild},
  author={Jin, Sheng and Xu, Lumin and Xu, Jin and Wang, Can and Liu, Wentao and Qian, Chen and Ouyang, Wanli and Luo, Ping},
  booktitle={Eur. Conf. Comput. Vis. (ECCV)},
  pages={196--214},
  year={2020}
}

@inproceedings{guler2018densepose,
  title={Densepose: Dense human pose estimation in the wild},
  author={G{\"u}ler, R{\i}za Alp and Neverova, Natalia and Kokkinos, Iasonas},
  booktitle={IEEE Conf. Comput. Vis. Pattern Recog. (CVPR)},
  pages={7297--7306},
  year={2018}
}

Reference

[1] Dollár, P., Wojek, C., Schiele, B., Perona, P.: Pedestrian detection: A benchmark. In: IEEE Conf. Comput. Vis. Pattern Recog. pp. 304–311 (2009)
[2] Zhang, S., Benenson, R., Schiele, B.: Citypersons: A diverse dataset for pedestrian detection. In: IEEE Conf. Comput. Vis. Pattern Recog. pp. 3213–3221 (2017)
[3] Shao, S., Zhao, Z., Li, B., Xiao, T., Yu, G., Zhang, X., Sun, J.: Crowdhuman: A benchmark for detecting human in a crowd. arXiv preprint arXiv:1805.00123 (2018)
[4] Andriluka, M., Pishchulin, L., Gehler, P., Schiele, B.: 2d human pose estimation: New benchmark and state of the art analysis. In: IEEE Conf. Comput. Vis. Pattern Recog. (2014)
[5] Andriluka, M., Iqbal, U., Insafutdinov, E., Pishchulin, L., Milan, A., Gall, J., Schiele, B.: Posetrack: A benchmark for human pose estimation and tracking. In: IEEE Conf. Comput. Vis. Pattern Recog. (2018)
[6] Gong, K., Liang, X., Li, Y., Chen, Y., Yang, M., Lin, L.: Instance-level human parsing via part grouping network. In: Eur. Conf. Comput. Vis. pp. 770–785 (2018)
[7] Li, J., Zhao, J., Wei, Y., Lang, C., Li, Y., Sim, T., Yan, S., Feng, J.: Multiple-human parsing in the wild. arXiv preprint arXiv:1705.07206 (2017)
[8] Liu, Z., Luo, P., Wang, X., Tang, X.: Deep learning face attributes in the wild. In: Int. Conf. Comput. Vis. (2015)
[9] gustsson, E., Timofte, R., Escalera, S., Baro, X., Guyon, I., Rothe, R.: Apparent and real age estimation in still images with deep residual regressors on appa-real database. In: IEEE Int. Conf. Auto. Face & Gesture Recog. pp. 87–94 (2017)
[10] Zhang, Y., Liu, L., Li, C., Loy, C.C.: Quantifying facial age by posterior of age comparisons. In: Brit. Mach. Vis. Conf. (2017)
[11] Li, Y., Huang, C., Loy, C.C., Tang, X.: Human attribute recognition by deep hierarchical contexts. In: Eur. Conf. Comput. Vis. (2016)
[12] Deng, Y., Luo, P., Loy, C.C., Tang, X.: Pedestrian attribute recognition at far distance. In: ACM Int. Conf. Multimedia. pp. 789–792 (2014)
[13] Liu, X., Zhao, H., Tian, M., Sheng, L., Shao, J., Yan, J., Wang, X.: Hydraplus-net: Attentive deep features for pedestrian analysis. In: Int. Conf. Comput. Vis. pp. 1–9 (2017)
[14] Zhang, S.H., Li, R., Dong, X., Rosin, P., Cai, Z., Han, X., Yang, D., Huang, H., Hu, S.M.: Pose2seg: Detection free human instance segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. pp. 889–898 (2019)
[15] Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft coco: Common objects in context. In: Eur. Conf. Comput. Vis. (2014)
[16] Jin, S., Xu, L., Xu, J., Wang, C., Liu, W., Qian, C., Ouyang, W., Luo, P.: Whole-body human pose estimation in the wild. In: Eur. Conf. Comput. Vis. (2020)
[17] Alp Güler, R., Neverova, N., Kokkinos, I.: Densepose: Dense human pose estimation in the wild. In: IEEE Conf. Comput. Vis. Pattern Recog. (2018)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

COCO_UniHuman.md

COCO_UniHuman.md

What is COCO-UniHuman?

How to Use?

Download

Annotation Format

Terms of Use

Compare with other popular datasets.

Citation

Reference

Files

COCO_UniHuman.md

Latest commit

History

COCO_UniHuman.md

File metadata and controls

What is COCO-UniHuman?

How to Use?

Download

Annotation Format

Terms of Use

Compare with other popular datasets.

Citation

Reference