Tracking the full body of a human from intensity images is a challenging task because appearance-based features vary widely with illumination and changes of pose. To overcome this problem, the authors of this paper propose a graph-based approach using depth images for pose-invariant human body tracking.
The authors attempt to build a human skeleton for each frame of a depth and RGB combined video. Initially, the human body is separated from the depth map and represented as a 3D point cloud. The 3D points represent the vertices of the graph and the edges are defined by spatial neighborhoods. The graph is used to generate a geodesic distance map from the body center to all other body parts. Landmarks on the body, such as the head, knees, and shoulders, are detected based on this map. Finally, a skeleton is fitted to the detected landmarks. Since some body parts may be occluded in a depth map, the authors use motion estimation based on optical flow from RGB frames to predict the positions of those parts.
The proposed algorithm has been tested on depth data captured by a time of flight (ToF) camera, as well as a Kinect sensor. The authors found that the Kinect data was more stable than the ToF data. This is encouraging for researchers who use Kinect for its low cost. However, the algorithm is computationally heavy and does not work properly if the human rotates fully about the vertical axis.
Overall, this is a good marker-free approach for tracking the human skeleton. This strategy can be used in various applications, such as surveillance, virtual reality, and medical diagnostics. Researchers working in depth estimation with specific reference to human body basics and activity tracking would find this paper particularly informative.