Using silver and semi-gold standard corpora to compare open named entity recognisers

Groza, Tudor, Oellrich, Anika and Collier, Nigel (2013). Using silver and semi-gold standard corpora to compare open named entity recognisers. In: Guo-Zheng Li, Xiaohua Hu, Sunghoon Kim, Habtom Ressom, Michael Hughes, Baoyan Liu, Geoff McLachlan, Michael Liebman and Hongye Sun, Bioinformatics and Biomedicine (BIBM), 2013 IEEE International Conference on. 2013 IEEE International Conference on Bioinformatics and Biomedicine, IEEE BIBM 2013, Shanghai, China, (481-485). 18 - 21 December 2013. doi:10.1109/BIBM.2013.6732541

Attached Files (Some files may be inaccessible until you login with your UQ eSpace credentials)
Name Description MIMEType Size Downloads

Author Groza, Tudor
Oellrich, Anika
Collier, Nigel
Title of paper Using silver and semi-gold standard corpora to compare open named entity recognisers
Conference name 2013 IEEE International Conference on Bioinformatics and Biomedicine, IEEE BIBM 2013
Conference location Shanghai, China
Conference dates 18 - 21 December 2013
Proceedings title Bioinformatics and Biomedicine (BIBM), 2013 IEEE International Conference on
Journal name Proceedings - 2013 IEEE International Conference on Bioinformatics and Biomedicine, IEEE BIBM 2013
Place of Publication Piscataway, NJ United States
Publisher I E E E
Publication Year 2013
Year available 2013
Sub-type Fully published paper
DOI 10.1109/BIBM.2013.6732541
Open Access Status
ISBN 9781479913091
Editor Guo-Zheng Li
Xiaohua Hu
Sunghoon Kim
Habtom Ressom
Michael Hughes
Baoyan Liu
Geoff McLachlan
Michael Liebman
Hongye Sun
Start page 481
End page 485
Total pages 5
Collection year 2014
Language eng
Abstract/Summary Ontologies have become a central resource for defining biomedical concepts but linkage to and from textual data is still an unresolved technology. In this paper we approach the task of concept recognition in text by comparing four extant systems (cTAKES, NCBO Annotator, BeCAS and Metamap) with default parameter settings. The systems are compared on benchmark data consisting of 2,163 scientific abstracts and 906 clinical trial reports using an automatically constructed "silver" standard and a random semi-gold standard evaluation methodology. Furthermore, evaluation is conducted on the basis of specific concept identifiers. Experimental results show: (i) Generally higher levels of concept recognition on clinical trial reports than on scientific abstracts; (ii) The best performing system we observed on the silver standard was cTAKES on both the abstract and clinical trial corpora, however NCBO Annotator performed stronger when considering only the selected broad semantic types; (iii) BeCAS and Metamap had a tendency to annotate coarser-grained annotations; (iv) the random semi-gold evaluation places an upper bound on the performance of systems. This shows broad agreement with the silver standard evaluation but highlights areas where the silver standard methodology might be improved.
Subjects 291500 Biomedical Engineering
Keyword Biomedical engineering
Ontologies
Q-Index Code E1
Q-Index Status Confirmed Code
Institutional Status UQ

 
Versions
Version Filter Type
Citation counts: Scopus Citation Count Cited 3 times in Scopus Article | Citations
Google Scholar Search Google Scholar
Created: Tue, 11 Mar 2014, 02:27:07 EST by System User on behalf of School of Information Technol and Elec Engineering