Skip to content

BenjiDayan/prox

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

89 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Human Motion prediction with Scene Context

We sought to train a human motion predictor, taking a time sequence of a 25 joint skeleton and predicting future joint locations. See final_report.pdf for our full writeup.

PROX dataset

We used the PROX datastet for training and validation. We originally wished to incorporate proximity maps as model inputs. These would provide scene context to inform better predictions (e.g. rolling around on a bed, sliding an arm across a table, leaning against a wall). But we ran out of time to adapt/train a joint model.

We made dataloaders for the various RGB image, Depth image, joint location files see benji_prox_dataloader.py. To use you will have to download the dataset yourself and provide a local path to it.

SMPLX model

The PROX dataset provides human joint locations (along with colour, depth images, scene meshes etc.) in the form of SMPLX parameters. We downloaded and placed a SMPLX model in ./models_smplx_v1_1 for working with this data, as well as visualisation. See their SMPLX website for how to download

Python Requirements

Installation of various packages is complicated, but roughly

torch==1.10.0
torchvision==0.11.1
torchaudio
smplx[all]
open3d==0.9.0.0

On Euler cluster we used a virtual environment with loaded modules: gcc/6.3.0 python_gpu/3.8.5 cuda/10.2.89 mesa/18.3.6 open3d/0.9.0 eth_proxy

Results

We show some qualitative results on our validation set, which contains 9 videos in 2 scenes: N3OpenArea and BasementSittingBooth. We input 5 frames of 1 second in video and predict 10 frames of 2 seconds. The green skeleton is the ground truth and the red one is the prediction.

RNN

RNN does not perform amazingly. Our history of model training was messy and ignorant. This is the best we achieved on a short time frame prediction, though we also tried to predict longer sequences which is harder. It does at least beat naive replication of final frames!

Predicted Lying human to sit up PROX Examples

Human Walking

PROX Examples

Assumes static standing human will start moving forwards - but has floating not walking feet

PROX Examples

Some more examples from PROX dataset RGB camera point of view

PROX Examples PROX Examples

Transformer

The transformer model is able to predict natural human movement trajectories and shapes, but is unable to follow the trajectories accurately.

Visualization

You can visualize skeletons in 2D and 3D with benji_3d_skel_in_scene_vis.ipynb , visualise_model_preds.ipynb.

See proximity_map_demo.ipynb for loading proximity maps.

Here's a gif of what a time sequence of proximity maps looks like:

prox_map.gif

Training models

Standalone scripts rnn_gru_joints_worldnorm.py and transformer_joints_worldnorm.py.

For RNN model, you can load with pytorch the best trained short 2 second prediction model at model.pt

from pose_gru import PoseGRU_inputFC2
gru = PoseGRU_inputFC2(input_size=(25,3), n_layers=3)
restore_dict = torch.load(save_path, map_location=device)
gru.load_state_dict(restore_dict['model_state_dict'])

You can download the pretrained transformer model here, and load it as:

from simple_transformer import PoseTransformer
transformer = PoseTransformer(num_tokens=25*3)
restore_dict = torch.load(save_path, map_location=device)
transformer.load_state_dict(restore_dict['model_state_dict'])

Acknowledgments

Many thanks to Siwei Zhang for supervising our team for this course project, and to the TA team of the 2022 Virtual Humans course - lead by Siyu Tang.

About

Course project for ETHZ Virtual Humans

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published