Computer vision has emerged as one of the prominent fields of research over the last few decades. It includes a wide range of applications ranging from face recognition, pedestrian detection, action recognition and tracking. However, it is a challenge to build effective systems that are able to handle occlusion, varying illumination, varying pose and other encountered factors in the practical environment. In addition, modelling video sequences by subspaces has recently shown promise for various computer vision applications due to their ability to accommodate the effects of image variations. Subspaces form a non-Euclidean and curved Riemannian manifold known as a Grassmann manifold.
In this work, our aim is to address three predominant tasks pertaining to many other computer vision applications such as content-based video analysis, security and surveillance, human-computer interaction, event analysis, human behaviour analysis and video retrieval. More specifically, we will address the challenging and fundamental tasks of
1. Visual recognition
3. Visual tracking
To address the first task, we propose to embed Grassmann manifolds into Reproducing Kernel Hilbert Spaces (RKHS) and then tackle the problem of discriminant analysis on such manifolds. To achieve efficient machinery, we present graph-based local discriminant analysis that utilises within-class and between-class similarity graphs to characterise intra-class compactness and inter-class separability. In addition, we also develop the proposed framework over the Riemannian manifolds. Thorough experiments on face and object recognition, action recognition, texture classification and person reidentification indicate that the proposed method obtains marked improvement in discrimination accuracy in comparison to several state-of-the-art methods.
The second task addressed by this work is clustering of data lying on Grassmann manifolds which plays an essential role in data analysis. A novel clustering method is proposed by defining a measure of cluster distortion and embed the manifolds such that the distortion is minimised. Furthermore, we extend the framework for the semi-supervised scenario. We show the optimal solution is a generalised eigenvalue problem that can be solved very efficiently. We also develop the semi-supervised intrinsic Grassmann kmeans algorithm as well as extending Locally Linear Embedding (LLE) and Laplacian Eigenmaps(LE) over Grassmann manifolds. Experiments on clustering synthetic data, human action sequences, face images, social behaviour and handwritten digits, show that in comparison to well-known methods, the proposed approach obtains a significant improvement in clustering accuracy, while also being several orders of magnitude faster.
To address the third task, we propose a tracking approach based on affine subspaces. As subspaces are able to accommodate the occlusion, pose, and illumination variations which is an essential precursor to obtaining a robust visual tracking system. We furthermore propose a novel approach to measure affine subspace-to-subspace distance via the use of the non-Euclidean geometry of Grassmann manifolds. Quantitative evaluation on challenging video sequences indicates that the proposed approach obtains considerably better performance than several recent state-of-the-art methods such as Tracking- Learning-Detection and MILtrack.