Online human gesture recognition using depth camera

Zhao, Xin (2014). Online human gesture recognition using depth camera PhD Thesis, Information Technology & Electrical Engineering, The University of Queensland. doi:10.14264/uql.2014.198

       
Attached Files (Some files may be inaccessible until you login with your UQ eSpace credentials)
Name Description MIMEType Size Downloads
s4240029_phd_submission.pdf Thesis full text application/pdf 1.50MB 0
Author Zhao, Xin
Thesis Title Online human gesture recognition using depth camera
School, Centre or Institute Information Technology & Electrical Engineering
Institution The University of Queensland
DOI 10.14264/uql.2014.198
Publication date 2014-01-01
Thesis type PhD Thesis
Supervisor Xue Li
Chaoyi Pang
Total pages 147
Total colour pages 18
Total black and white pages 129
Language eng
Subjects 080109 Pattern Recognition and Data Mining
080104 Computer Vision
080602 Computer-Human Interaction
Formatted abstract
Online human gesture recognition has a wide range of applications in computer vision and pattern recognition. Its applications include such as human-computer nteraction, electronic entertainment, video surveillance, patient monitoring, nursing homes, smart homes etc. Recent introduction of cost-effective depth cameras brings on a new trend of research on body-movement gesture recognition. However, there are five major challenges: i) how to continuously recognize gestures from unsegmented streams, ii) how to differentiate different styles of a same gesture from other types of gestures, iii) how to train gesture classifiers which can preserve the continuity of streams, iv) how to train gesture classifiers with both labelled as well as unlablled skeleton data and v) how to automatically calibrate multiple depth cameras to obtain accurate skeleton data. In this dissertation, we aim to solve these five problems with effective and efficient solutions, and evaluate our proposed approaches on benchmarks.

Firstly, we proposed a simple approach by using template matching method. For each predefined gesture type, a template is learned with all the instances of this gesture type. A template is consisted with two sequences: one is an average sequence which represents standard movement, and the other is a deviation sequence which represents personal style variation. Then we calculate optimal subsequence according to each template at every frame in a incoming stream. The frame is identified as the gesture type of corresponding template which obtains minimal distance and the distance is smaller than the given threshold. In this way, we achieve online gesture recognition without gesture instance segmentation. Experiments were conducted to show the feasibility of our approach.

Then a novel representation for extracting features of the human skeletons at a human-body-part movement level is proposed. This feature also be able to represent inherent human motion characteristics such that the explicit prior segmentation process would be avoided. We call this new feature as Structured Streaming Skeletons (SSS). In this way, the structure of streaming skeletons is represented by a combination of human-body-part movements. Because of the discriminative nature of SSS feature, the superior performance is achieved even with a simple classifier.

In order to preserve the continuity of skeleton streams, we proposed a new machine leaning method, namely Transitional Learning, for online gesture recognition. Besides positive samples and negative samples, we add a new kinds of samples, namely Transitional Samples, which are with soft labels. We also add a new continuity constraint to smooth all gesture labels into a series of consistent labels. Our comprehensive experiments on several public large datasets have demonstrated a superior performance of our proposed Transitional Learning algorithms than the traditional machine learning algorithms.

In order to train gesture classifiers with both labelled as well as unlabelled skeleton data, we proposed a novel semi-supervised learning algorithm called Semi-supervised Discriminant analysis with Global constraint (SDG), which optimizes supervised algorithm LDA with labelled training dada and unsupervised algorithms LPP and PCA with all training data, both labelled and unlabelled, to better estimate the data distribution. And trace ratio solution, which is more natural and accurate than ratio trace solution, is utilized to do the optimization for SDG. We use public mocap dataset HumanEva which is obtained by marker-based motion capture system, and our collected one skeleton dataset captured by a depth camera for the evaluation. Experimental results demonstrate the effectiveness of our algorithm.

At last, we proposed a method to automatically calibrate multiple depth camera system. Surface consistency is used as a volumetric criterion to measure the quality of pairwise surface alignment, and classic Simulated Annealing is applied for the optimization. In this way, we are able to obtain the 3D model of human body, which can improve the recognition of human skeleton. Analysis and results of our comprehensive experiments on both synthetic and real-world datasets demonstrate the feasibility of our approach.
Keyword Data mining
Machine learning
Gesture recognition
Template matching
Camera calibration

 
Citation counts: Google Scholar Search Google Scholar
Created: Mon, 07 Jul 2014, 08:35:06 EST by Xin Zhao on behalf of Scholarly Communication and Digitisation Service