Skip to content
/ VNect Public

Real-time 3D human pose estimation, implemented by tensorflow

License

Notifications You must be signed in to change notification settings

XinArkh/VNect

Repository files navigation

VNect

A tensorflow implementation of VNect: Real-time 3D Human Pose Estimation with a Single RGB Camera.

For the caffe model/weights required in the repository: please contact the author of the paper.

Environments

  • Python 3.x
  • tensorflow-gpu 1.x
  • pycaffe

Usage

Preparation

  1. Drop the pretrained caffe model into models/caffe_model.
  2. Run init_weights.py to generate tensorflow model weights.

Scripts

  1. run_estimator.py is a script for video stream.
  2. (Recommended) run_estimator_ps.py is a multiprocessing version script. Notice that in run_estimator.py, the 3D plotting function may shut down occasionally depending on the matplotlib version . run_estimator_ps.py resolves this issue.
  3. run_pic.py is a script for a picture.
  1. (Deprecated) benchmark.py is a class implementation containing all the elements needed to run the model.
  2. (Deprecated) run_estimator_robot.py additionally provides ROS network and/or serial connection for communication in robot controlling.
  3. (Deprecated) The training script train.py is not complete yet (I failed to reconstruct the model: ( So do not use it. Also pulling requests are welcomed.

[Tips] To run the scripts for video stream:

  1. click left mouse button to initialize the bounding box implemented by a simple HOG method;

  2. trigger any keyboard input to exit while running.

Notes

  1. With some certain programming environments, the 3D plotting function (by matplotlib) in run_estimator.py shuts down. In this case, use run_estimator_ps.py instead.
  2. The input image is in BGR color format and the pixel value is mapped into a range of [-0.4, 0.6).
  3. The joint-parent map (detailed information in materials/joint_index.xlsx):

  1. Here I have a sketch to show the joint positions (don't laugh lol):

  1. Every input image is assumed to contain 21 joints to be found, which means it is easy to fit wrong results when a joint is actually not in the picture.

About Training Data

For MPI-INF-3DHP dataset, refer to my another repository.

Reference Repositories