Active duplicate detection

Deng, Ke, Wang, Liwei, Zhou, Xiaofang, Sadiq, Shazia and Fung, Gabriel Pui Cheong (2010). Active duplicate detection. In: Hiroyuki Kitagawa, Yoshiharu Ishikawa, Qing Li and Chiemi Watanabe, Database Systems for Advanced Applications, Proceedings, Part I. 15th International Conference on Database Systems for Advanced Applications (DASFAA 2010), Tsukuba, Japan, (565-579). 1-4 April 2010.

Attached Files (Some files may be inaccessible until you login with your UQ eSpace credentials)
Name Description MIMEType Size Downloads

Author Deng, Ke
Wang, Liwei
Zhou, Xiaofang
Sadiq, Shazia
Fung, Gabriel Pui Cheong
Title of paper Active duplicate detection
Conference Paper Type Fully Published Paper
Conference name 15th International Conference on Database Systems for Advanced Applications (DASFAA 2010)    (ERA 2010 Rank A)
DOI 10.1007/978-3-642-12026-8_43
Conference location Tsukuba, Japan
Conference dates 1-4 April 2010
Proceedings title Database Systems for Advanced Applications, Proceedings, Part I  (ERA 2012 Listed)   Check publisher's open access policy
Journal name Lecture Notes in Computer Science  (ERA 2012 Listed)   Check publisher's open access policy
Editor Hiroyuki Kitagawa
Yoshiharu Ishikawa
Qing Li
Chiemi Watanabe
Place published Berlin, Germany
Publisher Springer-Verlag
Publication date 2010
Volume number 5981
ISBN 9783642120251
ISSN 0302-9743
Start page 565
End page 579
Total pages 14
Collection year 2011
Language eng
Abstract/Summary The aim of duplicate detection is to group records in a relation which refer to the same entity in the real world such as a person or business. Most existing works require user specified parameters such as similarity threshold in order to conduct duplicate detection. These methods are called user-first in this paper. However, in many scenarios, pre-specification from the user is very hard and often unreliable, thus limiting applicability of user-first methods. In this paper, we propose a user-last method, called Active Duplicate Detection (ADD), where an initial solution is returned without forcing user to specify such parameters and then user is involved to refine the initial solution. Different from user-first methods where user makes decision before any processing, ADD allows user to make decision based on an initial solution. The identified initial solution in ADD enjoys comparatively high quality and is easy to be refined in a systematic way (at almost zero cost).
Subjects E1
0906 Electrical and Electronic Engineering
Keyword Linkage
Records
Q-Index Code C1
Q-Index Status Confirmed Code
Institutional Status UQ

Document type: Conference Paper
Sub-type: Fully Published Paper
Collections: Official 2011 Collection
School of Information Technology and Electrical Engineering Publications
 
Versions
Version Filter Type
Citation counts: TR Web of Science Citation Count  Cited 0 times in Thomson Reuters Web of Science Article
Scopus Citation Count Cited 0 times in Scopus Article
Google Scholar Search Google Scholar
Access Statistics: 146 Abstract Views, 68 File Downloads  -  Detailed Statistics
Created: Sun, 01 Aug 2010, 00:09:18 EST