SharkDB: an in-memory storage system for large scale trajectory data management

Wang, Haozhou (2016). SharkDB: an in-memory storage system for large scale trajectory data management PhD Thesis, School of Information Technology and Electrical Engineering, The University of Queensland. doi:10.14264/uql.2016.101

Attached Files (Some files may be inaccessible until you login with your UQ eSpace credentials)
Name Description MIMEType Size Downloads
s4186998_final_thesis.pdf Thesis (open access) application/pdf 2.59MB 0
Author Wang, Haozhou
Thesis Title SharkDB: an in-memory storage system for large scale trajectory data management
School, Centre or Institute School of Information Technology and Electrical Engineering
Institution The University of Queensland
DOI 10.14264/uql.2016.101
Publication date 2016-02-26
Thesis type PhD Thesis
Supervisor Xiaofang Zhou
Shazia Sadiq
Total pages 171
Total colour pages 11
Total black and white pages 160
Language eng
Subjects 080604 Database Management
Formatted abstract
The rapid development of location-based technologies including animal tracking sensors, GPS devices embedded in taxis and buses, and smart phones carried by people has quickly led to the capability of collecting spatio-temporal information about almost any kind of moving object, resulting in huge volumes of spatio-temporal data in the form of trajectories. Business intelligence now has more interest in analysing large amounts of trajectory data rather than the data on hard disk, since querying on such trajectory data can reveal useful information. Due to high access latency and low I/O operations of hard disks, the disk-based storage systems (with traditional data structures) have been challenged by modern applications (e.g. location-based services), which require real time responses when querying large scale trajectory datasets. Therefore, novel data structures and query algorithms need to be designed to meet this requirement. In this thesis, a series of concrete and challenging problems about storing, managing and analysing large scale trajectory data are studied. A complete in-memory column-oriented storage system called SharkDB is implemented to address these problems and support real time computing for trajectory queries. Below is a brief description of contributions.

First of all, a preliminary study has been conducted to identify the trajectory synchronisation problem on large scale trajectory dataset. Based on this observation, a novel data structure, which is called a frame based data structure, is proposed to synchronise trajectories based on their temporal information. Meanwhile, to improve performance of trajectory queries, the frame based data structure is optimised by implementing this data structure into main memory with compression and CPU cache-optimisation techniques.

After implementing a frame based data structure, challenges with regard to trajectory query processing are investigated. To address these challenges, the trajectory queries are divided into three categories, i.e. basic operations, advanced operations and analytic operations. For each category, a naive algorithm is proposed first. To improve the query performance, for the category of basic operations, a parallel computing technique is used to speed up the running time of the query. For the category of advanced operations, a hierarchical I/P frame structure based approach is proposed. For the category of analytic operations, a MBR+KMP algorithm is presented.

To evaluate SharkDB, a comprehensive experimental study including operation level evaluation and system level evaluation is conducted. In the operation level evaluation, query processing using the proposed algorithms are compared to a traditional trajectory data structure. The extensive experiments demonstrate that the newly designed algorithms can guarantee real-time trajectory query processing on large scale trajectory dataset. A set of workload models that reflect real world workloads is proposed in the system level evaluation. The experiments on such workload models also verify the superiority of SharkDB over traditional data structures.

Finally, as in-memory database management systems are receiving more attention today, some commercial in-memory database management systems have been released. Hence, in the collaboration with SAP, SharkDB is migrated into SAP HANA. To achieve this, the data structures of SharkDB are re-designed to suit the architecture of SAP HANA. A set of experiments are conducted to show that SharkDB can beat other popular traditional data structures in regard to trajectory query processing.
Keyword Spatio-temporal database
Trajectory database
In-memory database
Trajectory compression
Query processing

Document type: Thesis
Collections: UQ Theses (RHD) - Official
UQ Theses (RHD) - Open Access
Version Filter Type
Citation counts: Google Scholar Search Google Scholar
Created: Mon, 22 Feb 2016, 13:31:00 EST by Haozhou Wang on behalf of Learning and Research Services (UQ Library)