Record matching with non-key attribute values

Yang, Qiang, Li, Zhi-Xu, Jiang, Jun, Zhao, Peng-Peng, Liu, Guan-Feng, Liu, An and Zhou, Xiao-Fang (2016) Record matching with non-key attribute values. Jisuanji Xuebao/Chinese Journal of Computers, 39 10: 2075-2087. doi:10.11897/SP.J.1016.2016.02075

Author Yang, Qiang
Li, Zhi-Xu
Jiang, Jun
Zhao, Peng-Peng
Liu, Guan-Feng
Liu, An
Zhou, Xiao-Fang
Title Record matching with non-key attribute values
Language of Title eng
Journal name Jisuanji Xuebao/Chinese Journal of Computers   Check publisher's open access policy
ISSN 0254-4164
Publication date 2016-10-01
Sub-type Article (original research)
DOI 10.11897/SP.J.1016.2016.02075
Open Access Status Not yet assessed
Volume 39
Issue 10
Start page 2075
End page 2087
Total pages 13
Place of publication Beijing, China
Publisher Zhongguo Kexueyuan Jisuan Jishu Yanjiusuo / Chinese Academy of Sciences, Institute of Computing Technology
Language chi
Subject 1712 Software
1708 Hardware and Architecture
1705 Computer Networks and Communications
1704 Computer Graphics and Computer-Aided Design
Abstract The advanced development of various technologies on social network, e-commerce and online education has contributed to an increasing amount of large-scale network data. Among all sorts of network analysis tasks, one basic task is to search important nodes in a network. Closeness centrality is one of the popular metrics which measure the importance of a node in a network. Based on the closeness centrality, the basic task is called top-k closeness centrality search. However, the existing exact approaches cannot process large-scale networks because of their polynomial time complexity. Recently, some approximation algorithms are proposed, which achieve high performance by sacrificing the precision of results. But according to our study, we find that the loss of the precision of results is too much. To improve the precision of results while maintaining the high performance, in this paper, we propose a Sketch-based approximation algorithm for fast searching top-k closeness centrality in a large-scale network. The new algorithm is developed based on a new computation method which calculates the centrality by estimating the number of nodes within a certain distance by a data structure called FM-Sketch. The new algorithm has time complexity O(m tDmax), where t is a constant, Dmax is the diameter of a network and m is the number of edges in a network. With the small-world phenomenon assumption, the Sketch-based algorithm is a linear algorithm. Finally, we compare our Sketch-based algorithm with the state-of-the-art exact and approximation algorithms through extensive experiments. The results demonstrate the advantages of the new solution.
Formatted abstract
摘 要 实体匹配旨在找出不同数据源中指代同一实体的实例.已有的实体匹配方法大都基于实体主属性值的相
算法等有关技术较 Baseline匹配算法在匹配效率上高出10倍多.
Keyword Algorithm
Data quality
Non-key attribute
Record matching
Q-Index Code C1
Q-Index Status Provisional Code
Grant ID 61303019
Institutional Status UQ

Document type: Journal Article
Sub-type: Article (original research)
Collections: HERDC Pre-Audit
School of Information Technology and Electrical Engineering Publications
Version Filter Type
Citation counts: Scopus Citation Count Cited 0 times in Scopus Article
Google Scholar Search Google Scholar
Created: Tue, 08 Nov 2016, 11:01:19 EST by System User on behalf of Learning and Research Services (UQ Library)