Accent Classification from Speech Samples by Use of Machine Learning

Pedersen, Carol (2009). Accent Classification from Speech Samples by Use of Machine Learning PhD Thesis, School of Information Technology and Electrical Engineering, The University of Queensland.

Attached Files (Some files may be inaccessible until you login with your UQ eSpace credentials)
Name Description MIMEType Size Downloads
s40940081_phd_totalthesis.pdf Full thesis application/pdf 1.93MB 25
Author Pedersen, Carol
Thesis Title Accent Classification from Speech Samples by Use of Machine Learning
School, Centre or Institute School of Information Technology and Electrical Engineering
Institution The University of Queensland
Publication date 2009-07
Thesis type PhD Thesis
Supervisor Joachim Diederich
Xue Li
John Ingram
Total pages 216
Total colour pages 3
Total black and white pages 213
Subjects 0906 Electrical and Electronic Engineering
Formatted abstract
“Accent” is the pattern of speech pronunciation by which one can identify a person’s linguistic, social or cultural background. It is an important source of inter-speaker variability and a particular problem for automated speech recognition. The aim of the study was to investigate a new computational approach to accent classification which did not require phonemic segmentation or the identification of phonemes as input, and which could therefore be used as a simple, effective accent classifier.

Through a series of structured experiments this study investigated the effectiveness of Support Vector Machines (SVMs) for speech accent classification using time-based units rather than linguistically-informed ones, and compared it to the accuracy of other machine learning methods, as well as the ability of humans to classify speech according to accent. A corpus of read-speech was collected in two accents of English (Arabic and “Indian”) and used as the main datasource for the experiments. Mel-frequency cepstral coefficients were extracted from the speech samples and combined into larger units of 10 to 150ms duration, which then formed the input data for the various machine learning systems. Support Vector Machines were found to classify the samples with up to 97.5% accuracy with very high precision and recall, using samples of between 1 and 4 seconds of speech. This compared favourably with a human listener study where subjects were able to distinguish between the two accent groups with an average of 92.5% accuracy in approximately 8 seconds. Repeating the SVM experiments on a different corpus resulted in a best classification accuracy of 84.6%. Experiments using a decision tree learner and a rule-based classifier on the original corpus gave a best accuracy of 95% but results over the range of conditions were much more variable than those using the SVM. Rule extraction was performed in order to help explain the results and better inform the design of the system. 

The new approach was therefore shown to be effective for accent classification, and a plan for its role within various other larger speech-related contexts was developed.
Keyword accent
accent classification
machine learning
Support vector machines
mel-frequency cepstral coefficients
rule extraction
human listeners
corpus evaluation

Citation counts: Google Scholar Search Google Scholar
Created: Fri, 18 Jun 2010, 18:22:08 EST by Mrs Carol Pedersen on behalf of Library - Information Access Service