ChildPlay: A New Benchmark for Understanding Children's Gaze Behaviour
Samy Tafasca *, Anshul Gupta *, Jean-Marc Odobez (* equal contribution)
ICCV 2023
[Paper] [Video] [Dataset]
This repository provides the official code and checkpoints for the GeomGaze model, as introduced in our ICCV paper, ChildPlay: A New Benchmark for Understanding Children's Gaze Behaviour. It also includes annotations and scripts for our novel semantic metric that evaluates gaze performance when looking at heads.
![]() |
![]() |
---|
The GeomGaze model constructs a geometrically consistent point cloud of the scene. This point cloud is matched with a predicted 3D gaze vector to compute the 3D Field-of-View (3DFoV), highlighting visible regions in 3D. The 3DFoV is then combined with the scene image to predict the final gaze target.
Download the required datasets:
- GazeFollow extended: [Download]
- VideoAttentionTarget: [Download]
- ChildPlay: [Download]
Update the dataset paths (*_data
) in config.py
accordingly.
Additionally, download our processed data: [Download]
- Validation labels: found in
labels/
after extraction from the download.- Update
*_train_label
,*_val_label
, and*_test_label
in the config.
- Update
- Fixed image cropping parameters: found in
val_crop_params/
after extraction from the download.- Update
*_val_crop_params
in the config.
- Update
-
Extract Depth Maps:
- Use the SamsungLabs depth estimation model with
domain=depth
. - We use the
b5_lrn4
model.
- Use the SamsungLabs depth estimation model with
-
Extract Focal Length:
- Use the AdelaiDepth model with ResNeXt101 backbone.
- Save focal lengths as separate
.txt
files per image. - We provide a modified inference script at
utils/test_shape.py
. - Optionally approximate focal length with the longest side of the image in pixels (there may be a loss in performance).
Ensure the extracted outputs follow the dataset directory structure and update *_depth
and *_focal_length
in the config.
We use PyTorch for our experiments. Install dependencies using:
conda env create -f environment.yml
python train.py --dataset GazeFollow
python train.py --dataset VideoAtt --init_weights <path>
Provide initial weights from training on GazeFollow using --init_weights
.
python train.py --dataset ChildPlay --init_weights <path>
Provide initial weights from training on GazeFollow using --init_weights
.
python test_on_gazefollow.py --orig_ar --model_weights <path> --csv_path <csv_path>
Provide the model weights using --model_weights
and the output path for predictions using --csv_path
.
python eval_on_vat_childplay.py --orig_ar --model_weights <path> --dataset <dataset> --csv_path <csv_path>
Specify the dataset (ChildPlay
or VideoAtt
) using --dataset
.
- Download our annotations: found in
LAH_annotations/
after extraction from the download. - Update
bbox_path
andgt_path
incompute_lah.py
. Thedata_path
remains as perconfig.py
. - Also update the
dataset
,subset
(only for ChildPlay) andpred_path
to the predictions csv. - Compute the LAH scores:
python compute_lah.py
Our checkpoints are available under the same download link as our processed data: [Download]
Model | Filename |
---|---|
Human-centric module (update human_centric_weights in config) |
human_centric.pt |
GazeFollow pre-trained | geomgaze_gazefollow.pt |
VideoAttentionTarget pre-trained | geomgaze_vat.pt |
ChildPlay pre-trained | geomgaze_childplay.pt |
If you use our code, please cite:
@InProceedings{Tafasca_2023_ICCV,
author = {Tafasca*, Samy and Gupta*, Anshul and Odobez, Jean-Marc},
title = {ChildPlay: A New Benchmark for Understanding Children's Gaze Behaviour},
booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
month = {October},
year = {2023},
pages = {20935-20946},
note = {* Equal contribution}
}
This code is adapted from our previous work:
- idiap/multimodal_gaze_target_prediction
- This work, in turn, leverages code from ejcgt/attention-target-detection.
We thank the authors for their contributions.