Non-Rigid Structure from Motion (NRSfM), for reconstructing real world monocular images sequences, is a notable problem in computer vision due to its broad applicability, such as to scene understanding, human behaviour analysis, smart surveillance systems, and advanced human– computer interfaces. Existing NRSfM methods have proposed two dual approaches to model nonrigid motion: low rank shape basis or low rank trajectory basis. The shape basis method generalizes poorly to different non-rigid objects and complex non-rigid motion sequences, such as a sequence containing multiple human actions adhering to multiply shaped subspaces. The trajectory basis method is shape independent, but it suffers from poor reconstruction for realistic camera motion (slow and smooth motion).
To address these problems, we will explore two directions. Firstly, the performance of trajectory basis NRSfM relies on two inherently conflicting factors: (i) the condition of the composed camera and trajectory basis matrix, and (ii) whether the trajectory basis has enough degrees of freedom to model the 3D point trajectory. Employing a trajectory basis of small size and with small capacity reduces the likelihood of an ill-conditioned system (when composed with the camera) during reconstruction. However, this increases capability of the basis to model the object’s “true” 3D point trajectories. We propose a strategy for learning a more compact trajectory basis using convolutional sparse coding from naturally occurring point trajectory corpora to increase the likelihood that the trajectory matrix composed with the camera is well conditioned for a broad class of point trajectories and camera motions. Furthermore, we explore methods for dealing with the case of an extremely slow moving camera by combining articulation constraints with trajectory basis.
Secondly, we will put forth a proposal to improve the current state of art shape basis NRSfM method by Dai et al., which assumes that the nonrigid motion lies in a low rank shape subspace . This is a limitation: we have found empirically that it exhibits poor reconstruction performance on complex motions (e.g., motions involving a sequence of primitive actions involving a human object, such as walk, sit, and stand). To circumvent this limitation of the current shape subspace method, we propose modelling a complex motion as a union of shape subspaces. Our approach is also able to cluster complex nonrigid motions into a union of subspaces, which can be used for temporal human action segmentation and recognition.