Group-based classification with an application in cervical cancer screening

Samsudin, Noor Azah (2014). Group-based classification with an application in cervical cancer screening PhD Thesis, School of Information Technology and Electrical Engineering, The University of Queensland.

       
Attached Files (Some files may be inaccessible until you login with your UQ eSpace credentials)
Name Description MIMEType Size Downloads
s4130975_phd_submission.pdf Thesis full text application/pdf 1.99MB 0
Author Samsudin, Noor Azah
Thesis Title Group-based classification with an application in cervical cancer screening
School, Centre or Institute School of Information Technology and Electrical Engineering
Institution The University of Queensland
Publication date 2014
Thesis type PhD Thesis
Supervisor Andrew Peter Bradley
Total pages 161
Total colour pages 24
Total black and white pages 137
Language eng
Subjects 0801 Artificial Intelligence and Image Processing
0899 Other Information and Computing Sciences
Formatted abstract
This thesis develops a set of tools to classify data that are specifically tailored for the purpose of medical diagnostic applications. In medical diagnostic applications, data obtained from a patient typically consist of a large number of instances or objects that require analysis. It is interesting to note that medical experts analyse such data in a holistic fashion; that is, they scrutinise the instances or objects as a whole in order to arrive at a diagnosis. However, the traditional approach to automating these medical diagnostic procedures has employed a two-step classification process where the classifier analyses every instance or object individually prior to a final classification step. This two-step classification process is prone to two major problems: firstly, the more instances to be analysed, the greater the computational complexity; secondly, the accuracy of the final classification step is highly dependent on the results of the individual instance analysis in the initial step. Therefore, this thesis explores a different approach in which the data are classified as a group in one step. Under this paradigm, not only can two-step classification problems be solved more efficiently, but it is also more faithful to the one-step holistic process that medical experts normally apply. It is shown that prior work on existing automated classification approaches are not directly concerned with classifying a group of data in one step. Rather, the premise of this thesis is that when the prior knowledge that a group of instances or objects in a sample belong to same, but unknown class membership, classification of the group is possible in a single step. This approach is referred to as group-based classification (GBC).

Initially, a GBC technique is developed using a hypothesis-testing framework by converting a multidimensional classification problem into one dimension using an appropriate statistical summary. The one-dimensional data are then classified using a statistical hypothesis test—specifically, an F-test—as a measure of group similarity. In both synthetic and real data sets, the proposed GBC technique outperforms existing two-step classifiers. In fact, based on the empirical study, when the size of the data is large enough, the GBC technique achieves an error rate of zero. Next, another set of GBCs technique is developed by extending the naive Bayes classifier and nearest neighbour classifier (and variants) to demonstrate both one- and two-step GBC techniques. The results for the synthetic and real data sets clearly demonstrate that using one-step GBC techniques can reduce the error rate in comparison to two-steps classifiers. Indeed, the one-step GBC is more effective than the two-step GBC in all data sets tested. We also demonstrate the application of GBC in classifying malignancy-associated changes (MACs) data for cervical cancer screening. The performances of the GBC techniques that are developed early are evaluated against other existing classifiers in terms of accuracy and area under the receiver operating characteristic curve (AUC). An analysis of variance (ANOVA) is then used to test the significance of any differences between the cross-validated estimates of the accuracy and the AUC. The GBC techniques show favourable accuracy and statistically significant improvement in the AUC compared to other classifiers.
Keyword Classification
Medical diagnostics
Pattern recognition
Machine learning
Group-based classification
Cervical cancer screening
Malignancy-associated changes (MACs)
F-test
Nearest neighbour
K-nearest neighbour

 
Citation counts: Google Scholar Search Google Scholar
Created: Tue, 18 Feb 2014, 16:22:18 EST by Ms Noor Azah Samsudin on behalf of Scholarly Communication and Digitisation Service