This thesis advances methods for performing markerless visual tracking of articulated bodies using one or two cameras. The research presented aims to improve upon existing Bayesian inspired tracking methods, by examining the ‘building blocks’ of these tracking algorithms, in particular the measurement function design, the state space selection, and local optimization methods. Results presented in this thesis show that improvements can be made in all of these areas. These improvements are applicable to a variety of Bayesian tracking algorithms.
This thesis begins by examining literature relevant to the visual tracking problem. This includes the measurement functions used by other authors, focussing on the edge detection methods used in both tracking and segmentation problems. A general overview of the global search problem is given next, as a global search is a fundamental part of a Bayesian tracking algorithm. The combination of Newton like local optimization methods and the measurement functions used in visual tracking problems is examined next, and it is shown that Newton optimizers are not ideally suited to these measurement functions. The Bayesian tracking framework is then detailed, along with a review of several existing Bayesian tracking algorithms. Finally some non Bayesian tracking algorithms are discussed.
Following the literature review, details of the models used in the experiments presented in this thesis are given. These include the articulated human body model, the camera model, image gradient metrics, self occlusion treatment, and a generic colour based region measurement method.
The use of graph based approaches for edge measurements is then investigated. Graph based methods are commonly used in image segmentation problems, however have not been applied to visual tracking problems. A novel method for performing edge measurements using the ‘shortest path’ around the object’s occluding contour is presented. Unlike in the segmentation problem, self occlusion models mean the weights or costs of some graph vertices can not be determined. Different treatments for occluded graph vertices are given and evaluated. It is shown that the graph based approach produces observational likelihoods that are more accurate and have significantly fewer local maxima than the edge measurement schemes previously used in tracking problems. While this approach is computationally more expensive than other methods, it is argued that this is offset by the reduced computational expense of the global search procedure used in tracking algorithms.
The choice of state space used in the tracking problems is examined next. While most authors have used a state space based on the joint angles of the human body, a Cartesian state space based on the world coordinates of limbs is proposed. While Cartesian based state spaces have been used by other authors for representations of kinematic models, to the author’s knowledge they have not been used for full kinematic models. It is shown that that the more linear relationship between state variables in the Cartesian space and the 3D locations of sampled points on the object improves dynamic model predictions and principal component analysis. It is also shown that the Cartesian formulation also increases the linearity between state variables and the image coordinates of sampled points on the object. This in turn improves the performance of local optimization methods which make localized quadratic approximations to the measurement function. While the Cartesian based space has a higher dimensionality than the rotation based space, the geometrically plausible region of the Cartesian space has the same content (area) as the rotation space, which negates the well known ‘curse of dimensionality’. A simple method is given to project an implausible Cartesian state to a geometrically plausible state, as well as a method to dampen the measurement function curvature in these implausible directions.
Following this, a novel local optimization method is proposed. This optimization method is specific to visual tracking problems, and uses the camera geometry to infer interesting search directions. Treatments for choosing these search directions are given for both the monocular and two camera cases. A problem decomposition is also used to reduce the computational cost of the optimizer. This method is shown to outperform Newton based optimizations in a rotation based state space, and gives at worst equivalent results to a Newton based approach in a Cartesian state space, but at a significantly reduced computational cost.
Finally, tracking results are presented for a difficult image sequence using the combined ideas presented in this thesis. This sequence is a golfer performing a golf swing, which is a highly dynamic motion with large object velocities and accelerations.