Sensor technologies and wireless sensor networks are enabling the capture and storage of large volumes of sensor data streams. However there are a number of characteristics associated with sensor data streams that hinder the sharing, analysis and re-use of such data on the Web. For example, because sensor data is both temporal and spatial in nature, its multi-dimensionality, combined with variations in granularity, makes it more difficult to analyse and interpret. These issues have created major challenges associated with the management, representation, analysis and indexing of large volumes of sensor data streams. Consequently, there is an urgent need to markup sensor data streams with well-defined semantics to drive the development of advanced applications such as situation awareness, predictive models and event detection. Given well-structured and semantically annotated sensor data streams, it is possible to reason across annotated sensor data streams to deduce new or implicit knowledge, discover significant (and erroneous) data and events and answer complex queries.
This thesis focuses on the application area of ecosystem monitoring. As such, it investigates novel solutions to the semantic annotation and reasoning challenges associated with sensor data streams acquired by ecosystem scientists who are monitoring: a) species behaviour and b) micro-climate changes within environmentally-sensitive regions. Within this context, this thesis focuses on the design, implementation and evaluation of innovative methods to tackle different challenges associated with the semantic annotation and reasoning of two classes of sensor data: a) animal accelerometry data streams (acquired via animal-attached tri-axial accelerometers); and b) environmental sensor data streams (acquired from wireless sensor networks). These two categories of sensor data are of particular interest because they are rapidly growing in volume, they present different but similar challenges and there is a need to correlate them in order to determine if changes in the environment are impacting on species behaviour.
The first component of the thesis investigates optimum methods of combining domain expert annotations and machine learning to improve the precision and efficiency of semantic annotations on 3D accelerometry data streams (to support animal behaviour recognition and analysis). The second component seeks to minimize the cost and effort involved in developing training corpuses for machine learning approaches, by evaluating an Optimal Graph Learning approach to automatic semantic annotation of 3D accelerometry data streams. The third component of the thesis tackles the problem of detecting, annotating and filtering errors and outliers in sensor data streams, from wireless sensor networks, employed for environmental monitoring. The fourth and final component investigates, implements and evaluates an approach for reasoning across multiple environmental sensor data streams to infer higher level knowledge (fire weather indices to predict bush fire risk).
In addition to introductory and literature reviews of the field, this thesis provides detailed descriptions and evaluations of the following four original contributions to the field:
•The SAAR (Semantic Annotation and Activity Recognition) approach, which is designed to assist biologists to automatically recognize animal activities from 3D accelerometry data streams, by combining an expert tagging service with machine learning algorithms. The experimental results show that SAAR enables ecologists with little knowledge of machine learning techniques to collaboratively build classification models with high levels of accuracy, sensitivity and specificity. The results also indicate that SAAR is able to use data from surrogate individuals to qualify and quantify the association between individual behavioural modes and tri-axial accelerometry data streams and apply the resulting model to similar species.
•The OGL (Optimal Graph Learning) approach, which is designed to enable semi-automatic annotation of animal accelerometry data streams by more accurately encoding similarities between data points. The OGL approach is compared with SAAR, and the experimental results show that OGL outperforms SAAR consistently, especially with a smaller number of annotated training samples. Moreover, additional experiments investigating the classification of images from three real world image datasets, demonstrate the superiority of OGL over existing graph construction methods, and demonstrate comparable performance with state-of-the-art learning methods, that rely on large manually annotated training corpuses.
•The SOUE-Detector (Segment Outliers and Unusual Events Detector) approach, which adopts ontologies and expert-defined, machine-processable rules (that define correlations between sensor properties) to detect and distinguish between erroneous segment outliers and genuine unusual events for wireless sensor networks. Experiments on real world sensor network datasets reveal that the proposed approach is able to efficiently and accurately detect both erroneous outliers and unusual events by making use of sensor data trend similarities and correlations between sensor properties.
•The SFWI (Semantic Fire Weather Index) approach, which aims to estimate fire weather indices by reasoning across cleaned wireless sensor network data streams, represented in RDF. The experimental results demonstrate: comparable performance with the state-of-the-art detection methods; the ability of generating more precise, spatio-temporally finer-grained Fire Weather Indices than that are currently available; and greatly improved querying speed in terms of running repeated queries over an extended period.