The past decades have witnessed a transformation from a desktop-based web to a predominantly mobile web, where more often than not, users access the web from mobile devices. As a result, a huge volume of geo-textual web objects that have both geographical location and textual description have been generated. In literature, there have been lots of efforts in enabling efficient processing largescale geo-textual data under a variety of problem settings. In spite of the remarkable progress in this field, unsolved challenges remain. In this PhD thesis, I investigate a few interesting but challenging problems of this area. The contribution of this thesis can be summarized in three aspects. First, a set of new query predicates has been defined with a target of more diversified data types (e.g., activity trajectories), underlying spaces (e.g., road network) and query semantics, by which users can acquire their interested results more easily and effectively. Second, from a technical point of view, I have developed I/O efficient indexing structures and search algorithms to assure that a query can be answered within a reasonably small amount of time especially when dealing with massive geo-textual objects, which is not uncommon in real application scenarios. Last but not least, extensive empirical studies have been conducted based on real and large-scale datasets, which uncovered interesting patterns, rules and trends. These insights have shown directions for further improving my methodologies and also shed lights on future research. Below is a brief description of the contributions:
First, I study the keyword-oriented queries on activity trajectories (KOAT). The activity trajectory is semantically enriched trajectory data, which embeds the information about behaviors of the moving objects. Therefore, searching activity trajectory is able to support a variety of applications for a better quality of location-based services. This work aims to return k trajectories that contain the most relevant keywords to the query and yield the least travel effort in the meantime. The main difference between this work and conventional spatial keyword query is that no query location exists in KOAT, which means the search area cannot be localized. To improve the query performance, a spatial-textual ranking function is first proposed between query keywords and activity trajectories. Then a best-first search algorithm based on a hybrid index structure is developed by applying effective pruning rules and efficient refinement strategies.
Second, the problem of keyword-aware continuous k nearest neighbour search on road networks is studied, which computes the k nearest vertices that contain the query keywords issued by a moving object, and maintains the results continuously as it is moving on the road network. In particular, a framework, called labeling approach for continuous query (LARC), is proposed for processing the query with both low computation and communication overheads. First, a keyword-based pivot tree (KP-tree) is proposed to improve the efficiency of the static k nearest neighbour query by avoiding massive network traversals and sequential probe of keywords. Then, the concepts of dominance interval and region on road network are developed, which share the similar intuition with safe region in Euclidean space but are more complicated with a dedicated design.
Finally, a novel type of query called clue-based route search (CRS) is investigated, which allows a user to provide clues on keywords and spatial relationships along the route. These personalized requirements make the route search become distance-sensitive such that the distances between PoIs along the route must be as close as possible to the user specified distance. To improve efficiency, a branch-and-bound algorithm is proposed that prunes unnecessary vertices in query processing. In order to quickly locate candidate, an AB-tree is proposed that stores both the distance and keyword information in a tree structure. To further reduce the index size, a PB-tree is constructed by utilizing the virtue of 2-hop label index to pinpoint the candidate.