Decision support methods for community-driven knowledge curation platforms in the biomedical domain

Paul, Razan (2013). Decision support methods for community-driven knowledge curation platforms in the biomedical domain PhD Thesis, The School of Information Technology and Electrical Engineering, The University of Queensland. doi:10.14264/uql.2014.282

Attached Files (Some files may be inaccessible until you login with your UQ eSpace credentials)
Name Description MIMEType Size Downloads
s4242044_phd_submission.pdf Thesis (open access) application/pdf 3.76MB 8

Author Paul, Razan
Thesis Title Decision support methods for community-driven knowledge curation platforms in the biomedical domain
School, Centre or Institute The School of Information Technology and Electrical Engineering
Institution The University of Queensland
DOI 10.14264/uql.2014.282
Publication date 2013
Thesis type PhD Thesis
Supervisor Jane Hunter
Tudor Groza
Andreas Zankl
Total pages 132
Total colour pages 15
Total black and white pages 117
Language eng
Subjects 0801 Artificial Intelligence and Image Processing
0806 Information Systems
Formatted abstract
The skeletal dysplasia domain is characterised by highly complex, heterogeneous and sparse data. In this domain, the analysis and interpretation of new patient cases relies on comparisons to past case studies due to the absence of defined guidelines and the lack of mature domain knowledge for this group of genetic disorders. In order to carry out the analysis and interpretation of new patient cases, practitioners currently query many heterogeneous data sources and aggregate diverse types of data. This integration represents a significant challenge due to the extreme heterogeneity of the data models, metadata schemas, vocabularies and data formats and inconsistencies in naming and identification conventions. Therefore, there is an urgent need to investigate the development of medical decision support systems which can assist clinicians and researchers to improve their understanding of the causes, behaviours, symptoms and attributes of the diseases and assist them in the decision-making process (e.g., diagnosis).

In the skeletal dysplasia domain, the absence of mature domain knowledge and the lack of documented and well-structured past cases, in addition to the general sparseness of skeletal dysplasia data, hinder the development of reliable decision support methods. In this thesis, the first step is taken towards developing a decision support framework in order to assist clinicians, doctors and researchers in this domain.

The developed framework is a phenotype-disorder lifecycle to assist practitioners in finalising patient cases by going from phenotypes to disorders and vice versa. The framework was developed in four phases and combines ontological techniques with inductive and statistical reasoning techniques. In the initial phase, the underlying data characteristics were analyzed and a novel machine learning approach was developed to produce probabilistic candidate rankings, which can serve as support for medical decision-making (e.g., diagnosis). In the second phase, the semantics encoded in the domain were exploited in order to find possible disorders for a new patient case by developing a semantic similarity-based approach. In the third phase, to discover the implicit relationships between different ontological concepts (e.g., phenotypes and disorders), semantic similarity methods, formulated using the intrinsic structure of a given ontology, were combined with traditional interestingness measures in the process of discovering association rules. The final phase proposes a data mining approach for discovering characteristic features in the context of a set of disorders.

From the data and domain knowledge perspective, the developed approaches and evaluation strategies led to a number of significant findings towards building a fully-fledged decision support framework; (i) the properties of data in the domain are rareness, sparseness, and high dimensionality; (ii) the domain knowledge introduced more noise compared to the noise produced by patient cases; and (iii) semantic similarity improves the overall accuracy in comparison with term-based matching.

From the algorithmic perspective, there are three significant findings; (i) tailoring the machine learning approach so it takes the sparseness of the data into consideration improves the overall accuracy of the disorder classification results; (ii) mining features which are common to a given class (as opposed to the entire dataset) and rare in the other classes helps to discover more accurate characteristic features for disorders in comparison to the class association algorithm; and (iii) applying semantic interestingness measures as opposed to the traditional measures for discovering association rules results in considerable improvements in accuracy.

The outcome of this research enables clinicians and researchers to acquire a critical mass of structured knowledge that will facilitate a better understanding of these genetic diseases and foster advances in the field. Furthermore, it will advance the discipline of medical informatics as it can be applied to other rare diseases.
Keyword Decision support
Skeletal Dysplasia
Machine learning
Semantic similarity

Document type: Thesis
Collections: UQ Theses (RHD) - Official
UQ Theses (RHD) - Open Access
Version Filter Type
Citation counts: Google Scholar Search Google Scholar
Created: Sun, 24 Aug 2014, 17:38:17 EST by Razan Paul on behalf of Scholarly Communication and Digitisation Service