Canonicalization of graph database records using similarity measures

Li, Na, Li, Qing and Wang, Liping (2008). Canonicalization of graph database records using similarity measures. In: Proceedings of the 2nd International Conference on Ubiquitous Information Management and Communication, ICUIMC-2008. 2nd International Conference on Ubiquitous Information Management and Communication, ICUIMC-2008, Suwon, South Korea, (278-283). 31 January - 1 February 2008. doi:10.1145/1352793.1352853


Author Li, Na
Li, Qing
Wang, Liping
Title of paper Canonicalization of graph database records using similarity measures
Conference name 2nd International Conference on Ubiquitous Information Management and Communication, ICUIMC-2008
Conference location Suwon, South Korea
Conference dates 31 January - 1 February 2008
Proceedings title Proceedings of the 2nd International Conference on Ubiquitous Information Management and Communication, ICUIMC-2008
Place of Publication New York, NY United States
Publisher ACM
Publication Year 2008
Year available 2008
Sub-type Fully published paper
DOI 10.1145/1352793.1352853
Open Access Status
ISBN 9781595939937
Start page 278
End page 283
Total pages 6
Language eng
Abstract/Summary Information extraction and crawling from the Web have been increasingly common, yet raw data are often noisy and redundant due to heterogeneous sources. Although much work has focused on duplicate records detection, there is little investigation in providing a uniform, standard result from the duplicates to users, which we refer to as a canonical result, and the process is referred to record canonicalization. In this paper, we focus on the situation of imperfect and duplicate documents on the Web, and propose a preprocessing method of graph canonicalization. We first formalize the problem of graph records canonicalization, and then we propose three possible solutions in order. Upon the framework, we implement graph selection canonicalization, which aims to construct a canonical graph by selecting the central graph among records. Experiment results demonstrate its performance in representing real world entities.
Subjects 1706 Computer Science Applications
Keyword Canonicalization
Database record
Deduplication
Graph mining
Q-Index Code E1
Q-Index Status Provisional Code
Institutional Status UQ

 
Versions
Version Filter Type
Citation counts: Scopus Citation Count Cited 1 times in Scopus Article | Citations
Google Scholar Search Google Scholar
Created: Wed, 27 Nov 2013, 21:55:19 EST by System User on behalf of School of Information Technol and Elec Engineering