Exploiting structural similarity for automatic information extraction from lists

Huynh, Dat T., Xu, Jiajie, Sadiq, Shazia and Zhou, Xiaofang (2013). Exploiting structural similarity for automatic information extraction from lists. In: Web Information Systems Engineering, WISE 2013 - 14th International Conference, Proceedings. 14th International Conference on Web Information Systems Engineering, WISE 2013, Nanjing, China, (202-215). 13-15 October 2013. doi:10.1007/978-3-642-41154-0_15


Author Huynh, Dat T.
Xu, Jiajie
Sadiq, Shazia
Zhou, Xiaofang
Title of paper Exploiting structural similarity for automatic information extraction from lists
Conference name 14th International Conference on Web Information Systems Engineering, WISE 2013
Conference location Nanjing, China
Conference dates 13-15 October 2013
Proceedings title Web Information Systems Engineering, WISE 2013 - 14th International Conference, Proceedings   Check publisher's open access policy
Journal name Lecture Notes in Computer Science   Check publisher's open access policy
Place of Publication Heidelberg, Germany
Publisher Springer
Publication Year 2013
Year available 2013
Sub-type Fully published paper
DOI 10.1007/978-3-642-41154-0_15
Open Access Status
ISBN 9783642411533
9783642411540
ISSN 0302-9743
1611-3349
Volume 8181
Issue PART 2
Start page 202
End page 215
Total pages 14
Collection year 2014
Language eng
Abstract/Summary In this paper, we propose a novel technique to reduce dependency on knowledge base for ONDUX, the current state-of-art method for information extraction by text segmentation. While the existing approach mainly relies on high overlapping between pre-existing data and input lists to build an extraction model, our approach exploits structural similarity of text segments in the sequences of a list to align them into groups to achieve effectiveness with low dependency on pre-existing data. Firstly, a structural similarity measure between text segments is proposed and combined with content similarity to assess how likely two text segments in a list should be aligned in the same group. Then we devise a data shifting-alignment technique in which positional information and the similarity scores are employed to cluster text segments into groups before their labels are revised by an HMM-based graphical model. The experimental results on different datasets demonstrate the ability of our method to extract information from lists with high performance and less dependence on knowledge base than the current state-of-art method.
Subjects 1700 Computer Science
2614 Theoretical Computer Science
Q-Index Code C1
Q-Index Status Confirmed Code
Institutional Status UQ

 
Versions
Version Filter Type
Citation counts: TR Web of Science Citation Count  Cited 0 times in Thomson Reuters Web of Science Article
Scopus Citation Count Cited 0 times in Scopus Article
Google Scholar Search Google Scholar
Created: Tue, 03 Dec 2013, 00:37:53 EST by System User on behalf of School of Information Technol and Elec Engineering