Active duplicate detection

Deng, Ke, Wang, Liwei, Zhou, Xiaofang, Sadiq, Shazia and Fung, Gabriel Pui Cheong (2010). Active duplicate detection. In: Hiroyuki Kitagawa, Yoshiharu Ishikawa, Qing Li and Chiemi Watanabe, Database Systems for Advanced Applications, Proceedings, Part I. 15th International Conference on Database Systems for Advanced Applications (DASFAA 2010), Tsukuba, Japan, (565-579). 1-4 April 2010.

Attached Files (Some files may be inaccessible until you login with your UQ eSpace credentials)
Name Description MIMEType Size Downloads

Author Deng, Ke
Wang, Liwei
Zhou, Xiaofang
Sadiq, Shazia
Fung, Gabriel Pui Cheong
Title of paper Active duplicate detection
Conference name 15th International Conference on Database Systems for Advanced Applications (DASFAA 2010)
Conference location Tsukuba, Japan
Conference dates 1-4 April 2010
Proceedings title Database Systems for Advanced Applications, Proceedings, Part I   Check publisher's open access policy
Journal name Lecture Notes in Computer Science   Check publisher's open access policy
Place of Publication Berlin, Germany
Publisher Springer-Verlag
Publication Year 2010
Sub-type Fully published paper
DOI 10.1007/978-3-642-12026-8_43
ISBN 9783642120251
ISSN 0302-9743
Editor Hiroyuki Kitagawa
Yoshiharu Ishikawa
Qing Li
Chiemi Watanabe
Volume 5981
Start page 565
End page 579
Total pages 14
Collection year 2011
Language eng
Abstract/Summary The aim of duplicate detection is to group records in a relation which refer to the same entity in the real world such as a person or business. Most existing works require user specified parameters such as similarity threshold in order to conduct duplicate detection. These methods are called user-first in this paper. However, in many scenarios, pre-specification from the user is very hard and often unreliable, thus limiting applicability of user-first methods. In this paper, we propose a user-last method, called Active Duplicate Detection (ADD), where an initial solution is returned without forcing user to specify such parameters and then user is involved to refine the initial solution. Different from user-first methods where user makes decision before any processing, ADD allows user to make decision based on an initial solution. The identified initial solution in ADD enjoys comparatively high quality and is easy to be refined in a systematic way (at almost zero cost).
Subjects E1
0906 Electrical and Electronic Engineering
Keyword Linkage
Records
Q-Index Code C1
Q-Index Status Confirmed Code
Institutional Status UQ

 
Versions
Version Filter Type
Citation counts: TR Web of Science Citation Count  Cited 0 times in Thomson Reuters Web of Science Article
Scopus Citation Count Cited 0 times in Scopus Article
Google Scholar Search Google Scholar
Access Statistics: 166 Abstract Views, 68 File Downloads  -  Detailed Statistics
Created: Sun, 01 Aug 2010, 00:09:18 EST