Approximate entity extraction in temporal databases

Lu, Wei, Fung, Gabriel Pui Cheong, Du, Xiaoyong, Zhou, Xiaofang, Chen, Lijiang and Deng, Ke (2011) Approximate entity extraction in temporal databases. World Wide Web, 14 2: 157-186. doi:10.1007/s11280-011-0109-5


Author Lu, Wei
Fung, Gabriel Pui Cheong
Du, Xiaoyong
Zhou, Xiaofang
Chen, Lijiang
Deng, Ke
Title Approximate entity extraction in temporal databases
Journal name World Wide Web   Check publisher's open access policy
ISSN 1386-145X
1573-1413
Publication date 2011-03-01
Sub-type Article (original research)
DOI 10.1007/s11280-011-0109-5
Open Access Status
Volume 14
Issue 2
Start page 157
End page 186
Total pages 30
Place of publication New York, United States
Publisher Springer
Formatted abstract
We study the problem of efficiently extracting K entities, in a temporal database, which are most similar to a given search query. This problem is well studied in relational databases, where each entity is represented as a single record and there exist a variety of methods to define the similarity between a record and the search query. However, in temporal databases, each entity is represented as a sequence of historical records. How to properly define the similarity of each entity in the temporal database still remains an open problem. The main challenging is that, when a user issues a search query for an entity, he or she is prone to mix up information of the same entity at different time points. As a result, methods, which are used in relational databases based on record granularity, cannot work any further. Instead, we regard each entity as a set of "virtual records", where attribute values of a "virtual record" can be from different records of the same entity. In this paper, we propose a novel evaluation model, based on which the similarity between each "virtual record" and the query can be effectively quantified, and the maximum similarity of its "virtual records" is taken as the similarity of an entity. For each entity, as the number of its "virtual records" is exponentially large, calculating the similarity of the entity is challenging. As a result, we further propose a Dominating Tree Algorithm (DTA), which is based on the bounding-pruning-refining strategy, to efficiently extract K entities with greatest similarities. We conduct extensive experiments on both real and synthetic datasets. The encouraging results show that our model for defining the similarity between each entity and the search query is effective, and the proposed DTA can perform at least two orders of magnitude improvement on the performance comparing with the naive approach.
Keyword Approximate entity extraction
Bounding-pruning-refining
n-partite graph
Temporal databases
Q-Index Code C1
Q-Index Status Provisional Code
Institutional Status UQ

Document type: Journal Article
Sub-type: Article (original research)
Collection: School of Information Technology and Electrical Engineering Publications
 
Versions
Version Filter Type
Citation counts: TR Web of Science Citation Count  Cited 3 times in Thomson Reuters Web of Science Article | Citations
Scopus Citation Count Cited 3 times in Scopus Article | Citations
Google Scholar Search Google Scholar
Created: Wed, 27 Nov 2013, 18:15:07 EST by System User on behalf of School of Information Technol and Elec Engineering