Addressing instance ambiguity in web harvesting

Li, Zhixu, Zhang, XiangLiang, Huang, Hai, Xie, Qing, Zhu, Jia and Zhou, Xiaofang (2015). Addressing instance ambiguity in web harvesting. In: Julia Stoyanovich and Fabian M. Suchanek, 18th International Workshop on the Web and Databases, WebDB 2015: Freshness, Correctness, Quality of Information and Knowledge on the Web - Proceedings. 18th International Workshop on the Web and Databases, WebDB 2015, Melbourne, Australia, (6-12). 31 May 2015. doi:10.1145/2767109.2767114


Author Li, Zhixu
Zhang, XiangLiang
Huang, Hai
Xie, Qing
Zhu, Jia
Zhou, Xiaofang
Title of paper Addressing instance ambiguity in web harvesting
Conference name 18th International Workshop on the Web and Databases, WebDB 2015
Conference location Melbourne, Australia
Conference dates 31 May 2015
Convener Julia Stoyanovich
Proceedings title 18th International Workshop on the Web and Databases, WebDB 2015: Freshness, Correctness, Quality of Information and Knowledge on the Web - Proceedings
Journal name 18th International Workshop on the Web and Databases, WebDB 2015: Freshness, Correctness, Quality of Information and Knowledge on the Web - Proceedings
Place of Publication New York NY, United States
Publisher The Association for Computing Machinery
Publication Year 2015
Year available 2015
Sub-type Fully published paper
DOI 10.1145/2767109.2767114
Open Access Status Not Open Access
ISBN 9781450336277
Editor Julia Stoyanovich
Fabian M. Suchanek
Start page 6
End page 12
Total pages 7
Chapter number 2
Total chapters 10
Collection year 2016
Language eng
Abstract/Summary Web Harvesting enables the enrichment of incomplete data sets by retrieving required information from the Web. However, the ambiguity of instances may greatly decrease the quality of the harvested data, given that any instance in the local data set may become ambiguous when attempting to identify it on the Web. Although plenty of disambiguation methods have been proposed to deal with the ambiguity problems in various settings, none of them are able to handle the instance ambiguity problem in Web Harvesting. In this paper, we propose to do instance disambiguation in Web Harvesting with a novel disambiguation method inspired by the idea of collaborative identity recognition. In particular, we expect to find some common properties in forms of latent shared attribute values among instances in the list, such that these shared attribute values can differentiate instances within the list against those ambiguous ones on the Web. Our extensive experimental evaluation illustrates the utility of collaborative disambiguation for a popular Web Harvesting application, and shows that it substantially improves the accuracy of the harvested data.
Q-Index Code E1
Q-Index Status Provisional Code
Institutional Status UQ

 
Versions
Version Filter Type
Citation counts: Scopus Citation Count Cited 0 times in Scopus Article
Google Scholar Search Google Scholar
Created: Tue, 22 Mar 2016, 03:11:33 EST by System User on behalf of School of Information Technol and Elec Engineering