rwightman · Dec 20, 2019 · Dec 20, 2019 · Dec 23, 2019 · Dec 23, 2019 · Dec 23, 2019
diff --git a/.gitignore b/.gitignore
@@ -1,8 +1,14 @@
 images/*
 output/*
+output_old/*
 .idea/*
 .idea
 _models/*
+_tf_models/*
+_tfjs_models/*
+_posenet_weights/*
+docker/requirements.txt
+*.mp4
 
 # Byte-compiled / optimized / DLL files
 __pycache__/
@@ -174,6 +180,7 @@ cmake-build-*/
 
 # IntelliJ
 out/
+output.txt
 
 # mpeltonen/sbt-idea plugin
 .idea_modules/
@@ -194,4 +201,4 @@ fabric.properties
 .idea/httpRequests
 
 # Android studio 3.1+ serialized cache file
-.idea/caches/build_file_checksums.ser
+.idea/caches/build_file_checksums.ser
diff --git a/NOTICE.txt b/NOTICE.txt
@@ -1,11 +1,9 @@
 PoseNet Python
 Copyright 2018 Ross Wightman
 
-Posenet tfjs converter (code in posenet/converter)
-Copyright (c) 2017 Infocom TPO (https://lab.infocom.co.jp/)
-Modified (c) 2018 Ross Wightman
+PoseNet Python numerous refactorings
+Copyright 2020 Peter Rigole
 
 tfjs PoseNet weights and original JS code
 Copyright 2018 Google LLC. All Rights Reserved.
-
-
+(https://github.com/tensorflow/tfjs-models | Apache License 2.0)
diff --git a/README.md b/README.md
@@ -1,69 +1,122 @@
 ## PoseNet Python
 
-This repository contains a pure Python implementation (multi-pose only) of the Google TensorFlow.js Posenet model. For a (slightly faster) PyTorch implementation that followed from this, see (https://github.com/rwightman/posenet-pytorch)
+This repository originates from [rwightman/posenet-python](https://github.com/rwightman/posenet-python) and has been 
+heavily refactored to: 
+ * make it run the posenet v2 networks 
+ * get it to work with the latest tfjs graph serialization 
+ * extend it with the ResNet50 network
+ * make the code run on TF 2.x
+ * get all code running in docker containers for ease of use and installation (no conda necessary)
+
+This repository contains a pure Python implementation (multi-pose only) of the Google TensorFlow.js Posenet model. 
+For a (slightly faster) PyTorch implementation that followed from this, 
+see (https://github.com/rwightman/posenet-pytorch)
+
 
-I first adapted the JS code more or less verbatim and found the performance was low so made some vectorized numpy/scipy version of a few key functions (named `_fast`).
+### Install
 
-Further optimization is possible
-* The base MobileNet models have a throughput of 200-300 fps on a GTX 1080 Ti (or better)
-* The multi-pose post processing code brings this rate down significantly. With a fast CPU and a GTX 1080+:
-  * A literal translation of the JS post processing code dropped performance to approx 30fps
-  * My 'fast' post processing results in 90-110fps
-* A Cython or pure C++ port would be even better...  
+A suitable Python 3.x environment with Tensorflow 2.x. For a quick setup, use docker. 
 
-### Install
+If you want to use the webcam demo, a pip version of opencv (`pip install opencv-python`) is required instead of 
+the conda version. Anaconda's default opencv does not include ffpmeg/VideoCapture support. Also, you may have to 
+force install version 3.4.x as 4.x has a broken drawKeypoints binding.
 
-A suitable Python 3.x environment with a recent version of Tensorflow is required.
+Have a look at the docker configuration for a quick setup. If you want conda, have a look at the `requirements.txt` 
+file to see what you should install. Know that we rely on https://github.com/patlevin/tfjs-to-tf for 
+converting the tensorflow.js serialization to the tensorflow saved model. So you have to install this package: 
 
-Development and testing was done with Conda Python 3.6.8 and Tensorflow 1.12.0 on Linux.
+```bash
+git clone https://github.com/patlevin/tfjs-to-tf.git 
+cd tfjs-to-tf 
+pip install . --no-deps 
+```
 
-Windows 10 with the latest (as of 2019-01-19) 64-bit Python 3.7 Anaconda installer was also tested.
+Use the `--no-deps` flag to prevent tfjs-to-tf from installing Tensorflow 1.x as this would uninstall your 
+Tensorflow 2.x!
 
-If you want to use the webcam demo, a pip version of opencv (`pip install opencv-python`) is required instead of the conda version. Anaconda's default opencv does not include ffpmeg/VideoCapture support. Also, you may have to force install version 3.4.x as 4.x has a broken drawKeypoints binding.
 
-A conda environment setup as below should suffice: 
-```
-conda install tensorflow-gpu scipy pyyaml python=3.6
-pip install opencv-python==3.4.5.20
+### Using Docker 
 
-```
+A convenient way to run this project is by building and running the docker image, because it has all the requirements 
+built-in. 
+The GPU version is tested on a Linux machine. You need to install the nvidia host driver and the nvidia-docker toolkit. 
+Once set up, you can make as many images as you want with different dependencies without touching your host OS 
+(or fiddling with conda).  
 
-### Usage
+If you just want to test this code, you can run everything on a CPU just as well. You still get 8fps on mobilenet and 
+4fps on resnet50. Replace `GPU` below with `CPU` to test on a CPU.
+
+```bash
+cd docker
+./docker_img_build.sh GPU
+cd ..  
+. ./bin/exportGPU.sh
+./bin/get_test_images_run.sh
+./bin/image_demo_run.sh
+``` 
+
+Some pointers to get you going on the Linux machine setup. Most links are based on Ubuntu, but other distributions 
+should work fine as well. 
+* [Install docker](https://docs.docker.com/install/linux/docker-ce/ubuntu/ )
+* [Install the NVIDIA host driver](https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#ubuntu-installation)
+  * remember to reboot here
+* [Install the NVIDIA Container Toolkit](https://github.com/NVIDIA/nvidia-docker)
+* check your installation: `docker run --gpus all nvidia/cuda nvidia-smi`
 
-There are three demo apps in the root that utilize the PoseNet model. They are very basic and could definitely be improved.
 
-The first time these apps are run (or the library is used) model weights will be downloaded from the TensorFlow.js version and converted on the fly.
+### Usage
+
+There are three demo apps in the root that utilize the PoseNet model. They are very basic and could definitely be 
+improved.
 
-For all demos, the model can be specified with the '--model` argument by using its ordinal id (0-3) or integer depth multiplier (50, 75, 100, 101). The default is the 101 model.
+The first time these apps are run (or the library is used) model weights will be downloaded from the TensorFlow.js 
+version and converted on the fly.
 
 #### image_demo.py 
 
-Image demo runs inference on an input folder of images and outputs those images with the keypoints and skeleton overlayed.
+Image demo runs inference on an input folder of images and outputs those images with the keypoints and skeleton 
+overlayed.
 
-`python image_demo.py --model 101 --image_dir ./images --output_dir ./output`
+`python image_demo.py --model resnet50 --stride 16 --image_dir ./images --output_dir ./output`
 
 A folder of suitable test images can be downloaded by first running the `get_test_images.py` script.
 
 #### benchmark.py
 
-A minimal performance benchmark based on image_demo. Images in `--image_dir` are pre-loaded and inference is run `--num_images` times with no drawing and no text output.
+A minimal performance benchmark based on image_demo. Images in `--image_dir` are pre-loaded and inference is 
+run `--num_images` times with no drawing and no text output.
+
+Running the benchmark cycling 1000 times through the example images on a Geforce GTX 1080ti gives these average FPS 
+using TF 2.0.0:
+
+```
+ResNet50 stride 16: 32.41 FPS
+ResNet50 stride 32: 38.70 FPS 
+MobileNet stride 8: 37.90 FPS (this is surprisingly slow for mobilenet, ran this several times, same result)
+MobileNet stride 16: 58.64 FPS
+```
+
+Faster FPS have been reported by Ross Wightmann on the original codebase in 
+[rwightman/posenet-python](https://github.com/rwightman/posenet-python), so if anyone has a pull request that 
+improves the performance of this codebase, feel free to let me know! 
 
 #### webcam_demo.py
 
-The webcam demo uses OpenCV to capture images from a connected webcam. The result is overlayed with the keypoints and skeletons and rendered to the screen. The default args for the webcam_demo assume device_id=0 for the camera and that 1280x720 resolution is possible.
+The webcam demo uses OpenCV to capture images from a connected webcam. The result is overlayed with the keypoints and 
+skeletons and rendered to the screen. The default args for the webcam_demo assume device_id=0 for the camera and 
+that 1280x720 resolution is possible.
 
 ### Credits
 
-The original model, weights, code, etc. was created by Google and can be found at https://github.com/tensorflow/tfjs-models/tree/master/posenet
+The original model, weights, code, etc. was created by Google and can be found at 
+https://github.com/tensorflow/tfjs-models/tree/master/posenet
 
-This port and my work is in no way related to Google.
+This port is initially created by Ross Wightman and later upgraded by Peter Rigole and is in no way related to Google.
 
-The Python conversion code that started me on my way was adapted from the CoreML port at https://github.com/infocom-tpo/PoseNet-CoreML
+The Python conversion code that started me on my way was adapted from the CoreML port at 
+https://github.com/infocom-tpo/PoseNet-CoreML
 
-### TODO (someday, maybe)
-* More stringent verification of correctness against the original implementation
+### TODO 
 * Performance improvements (especially edge loops in 'decode.py')
 * OpenGL rendering/drawing
 * Comment interfaces, tensor dimensions, etc
-* Implement batch inference for image_demo
-
diff --git a/benchmark.py b/benchmark.py
@@ -1,49 +1,47 @@
 import tensorflow as tf
+import cv2
 import time
 import argparse
 import os
-
-import posenet
+from posenet.posenet_factory import load_model
 
 
 parser = argparse.ArgumentParser()
-parser.add_argument('--model', type=int, default=101)
+parser.add_argument('--model', type=str, default='resnet50')  # mobilenet resnet50
+parser.add_argument('--stride', type=int, default=16)  # 8, 16, 32 (max 16 for mobilenet)
+parser.add_argument('--quant_bytes', type=int, default=4)  # 4 = float
+parser.add_argument('--multiplier', type=float, default=1.0)  # only for mobilenet
 parser.add_argument('--image_dir', type=str, default='./images')
 parser.add_argument('--num_images', type=int, default=1000)
 args = parser.parse_args()
 
 
 def main():
 
-    with tf.Session() as sess:
-        model_cfg, model_outputs = posenet.load_model(args.model, sess)
-        output_stride = model_cfg['output_stride']
-        num_images = args.num_images
-
-        filenames = [
-            f.path for f in os.scandir(args.image_dir) if f.is_file() and f.path.endswith(('.png', '.jpg'))]
-        if len(filenames) > num_images:
-            filenames = filenames[:num_images]
-
-        images = {f: posenet.read_imgfile(f, 1.0, output_stride)[0] for f in filenames}
-
-        start = time.time()
-        for i in range(num_images):
-            heatmaps_result, offsets_result, displacement_fwd_result, displacement_bwd_result = sess.run(
-                model_outputs,
-                feed_dict={'image:0': images[filenames[i % len(filenames)]]}
-            )
-
-            output = posenet.decode_multiple_poses(
-                heatmaps_result.squeeze(axis=0),
-                offsets_result.squeeze(axis=0),
-                displacement_fwd_result.squeeze(axis=0),
-                displacement_bwd_result.squeeze(axis=0),
-                output_stride=output_stride,
-                max_pose_detections=10,
-                min_pose_score=0.25)
-
-        print('Average FPS:', num_images / (time.time() - start))
+    print('Tensorflow version: %s' % tf.__version__)
+    assert tf.__version__.startswith('2.'), "Tensorflow version 2.x must be used!"
+
+    model = args.model  # mobilenet resnet50
+    stride = args.stride  # 8, 16, 32 (max 16 for mobilenet)
+    quant_bytes = args.quant_bytes  # float
+    multiplier = args.multiplier  # only for mobilenet
+
+    posenet = load_model(model, stride, quant_bytes, multiplier)
+
+    num_images = args.num_images
+    filenames = [
+        f.path for f in os.scandir(args.image_dir) if f.is_file() and f.path.endswith(('.png', '.jpg'))]
+    if len(filenames) > num_images:
+        filenames = filenames[:num_images]
+
+    images = {f: cv2.imread(f) for f in filenames}
+
+    start = time.time()
+    for i in range(num_images):
+        image = images[filenames[i % len(filenames)]]
+        posenet.estimate_multiple_poses(image)
+
+    print('Average FPS:', num_images / (time.time() - start))
 
 
 if __name__ == "__main__":

diff --git a/bin/benchmark_run.sh b/bin/benchmark_run.sh
@@ -0,0 +1,3 @@
+#!/usr/bin/env bash
+
+./bin/docker_run.sh python benchmark.py --model mobilenet --stride 16 --image_dir ./images --num_images 1000
diff --git a/bin/docker_run.sh b/bin/docker_run.sh
@@ -0,0 +1,28 @@
+#!/usr/bin/env bash
+
+WORK=$(pwd)
+
+if [ -z "$POSENET_PYTHON_DEVICE" ]; then
+  echo "set the environment variable POSENET_PYTHON_DEVICE to CPU or GPU, or enter your choice below:"
+  read -p "Enter your device (CPU or GPU): "  device
+  if [ "$device" = "GPU" ]; then
+    source exportGPU.sh
+  elif [ "$device" = "CPU" ]; then
+    source exportCPU.sh
+  else
+    echo "Device configuration failed..."
+    exit 1
+  fi
+fi
+
+echo "device is: $POSENET_PYTHON_DEVICE"
+
+if [ "$POSENET_PYTHON_DEVICE" = "GPU" ]; then
+  image="posenet-python-gpu"
+  gpu_opts="--gpus all"
+else
+  image="posenet-python-cpu"
+  gpu_opts=""
+fi
+
+docker run $gpu_opts -it --rm -v $WORK:/work "$image" "$@"
diff --git a/bin/exportCPU.sh b/bin/exportCPU.sh
@@ -0,0 +1,4 @@
+#!/usr/bin/env bash
+# source this file to set your environment on a CPU device
+# $ . exportCPU.sh
+export POSENET_PYTHON_DEVICE=CPU
diff --git a/bin/exportGPU.sh b/bin/exportGPU.sh
@@ -0,0 +1,4 @@
+#!/usr/bin/env bash
+# source this file to set your environment on a GPU device
+# $ . exportGPU.sh
+export POSENET_PYTHON_DEVICE=GPU
diff --git a/bin/get_test_images_run.sh b/bin/get_test_images_run.sh
@@ -0,0 +1,3 @@
+#!/usr/bin/env bash
+
+./bin/docker_run.sh python get_test_images.py
diff --git a/bin/image_demo_run.sh b/bin/image_demo_run.sh
@@ -0,0 +1,3 @@
+#!/usr/bin/env bash
+
+./bin/docker_run.sh python image_demo.py --model resnet50 --stride 16 --image_dir ./images --output_dir ./output
diff --git a/bin/inspect_saved_model.sh b/bin/inspect_saved_model.sh
@@ -0,0 +1,6 @@
+#!/usr/bin/env bash
+
+FOLDER=$1
+
+# e.g.: $> ./inspect_saved_model.sh _tf_models/posenet/mobilenet_v1_100/stride16
+./bin/docker_run.sh saved_model_cli show --dir "$FOLDER" --all
diff --git a/bin/upgrade-tf-v2.sh b/bin/upgrade-tf-v2.sh
@@ -0,0 +1,10 @@
+#!/usr/bin/env bash
+
+# run this from the top-level folder of the project
+
+WORK=$(dirname $(pwd))
+
+docker run --gpus all -it -v $WORK:/work posenet-python tf_upgrade_v2 \
+  --intree posenet-python/ \
+  --outtree posenet-python_v2/ \
+  --reportfile posenet-python/report.txt
diff --git a/bin/video_demo_run.sh b/bin/video_demo_run.sh
@@ -0,0 +1,9 @@
+#!/usr/bin/env bash
+
+#./bin/docker_run.sh python video_demo.py --model resnet50 --stride 16 --input_file "Pexels Videos 3552510.mp4" --output_file "Pexels Videos 3552510-with_pose.mp4"
+./bin/docker_run.sh python video_demo.py --model resnet50 --stride 16 --input_file "exki.mp4" --output_file "exki_with_pose.mp4"
+./bin/docker_run.sh python video_demo.py --model resnet50 --stride 16 --input_file "night-bridge.mp4" --output_file "night-bridge_with_pose.mp4"
+./bin/docker_run.sh python video_demo.py --model resnet50 --stride 16 --input_file "night-colorful.mp4" --output_file "night-colorful_with_pose.mp4"
+./bin/docker_run.sh python video_demo.py --model resnet50 --stride 16 --input_file "night-street.mp4" --output_file "night-street_with_pose.mp4"
+./bin/docker_run.sh python video_demo.py --model resnet50 --stride 16 --input_file "pedestrians.mp4" --output_file "pedestrians_with_pose.mp4"
+./bin/docker_run.sh python video_demo.py --model resnet50 --stride 16 --input_file "sidewalk.mp4" --output_file "sidewalk_with_pose.mp4"
diff --git a/bin/webcam_demo_run.sh b/bin/webcam_demo_run.sh
@@ -0,0 +1,3 @@
+#!/usr/bin/env bash
+
+./bin/docker_run.sh python webcam_demo.py --model resnet50 --stride 16 --image_dir ./images --output_dir ./output
diff --git a/docker/Dockerfile b/docker/Dockerfile
@@ -0,0 +1,34 @@
+# default image version, override using --build-arg IMAGE_VERSION=otherversion
+ARG IMAGE_VERSION=2.1.0-py3-jupyter
+FROM tensorflow/tensorflow:$IMAGE_VERSION
+# The default version is the CPU version!
+# see: https://www.tensorflow.org/install/docker
+# see: https://hub.docker.com/r/tensorflow/tensorflow/
+
+# Install system packages
+RUN apt-get update && apt-get install -y --no-install-recommends \
+      bzip2 \
+      git \
+      wget && \
+    pip install --upgrade pip && \
+    rm -rf /var/lib/apt/lists/*
+
+COPY requirements.txt /work/
+
+WORKDIR /work
+
+# run pip install with the '--no-deps' argument, to avoid that tensorflowjs installs an old version of tensorflow!
+# It also ensures that we know and controll the transitive dependencies (although the tensorflow docker image comes
+# with a lot of packages pre-installed).
+RUN pip install -r requirements.txt --no-deps
+
+RUN git clone https://github.com/patlevin/tfjs-to-tf.git && \
+    cd tfjs-to-tf && \
+    git checkout v0.3.0 && \
+    pip install . --no-deps && \
+    cd .. && \
+    rm -r tfjs-to-tf
+
+ENV PYTHONPATH='/work/:$PYTHONPATH'
+
+CMD ["bash"]
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1,3 @@
		#!/usr/bin/env bash

		./bin/docker_run.sh python benchmark.py --model mobilenet --stride 16 --image_dir ./images --num_images 1000
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1,3 @@
		#!/usr/bin/env bash

		./bin/docker_run.sh python get_test_images.py
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1,3 @@
		#!/usr/bin/env bash

		./bin/docker_run.sh python image_demo.py --model resnet50 --stride 16 --image_dir ./images --output_dir ./output
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1,3 @@
		#!/usr/bin/env bash

		./bin/docker_run.sh python webcam_demo.py --model resnet50 --stride 16 --image_dir ./images --output_dir ./output