PhenoMiner: from text to a database of phenotypes associated with OMIM diseases

Collier, Nigel, Groza, Tudor, Smedley, Damian, Robinson, Peter N., Oellrich, Anika and Rebholz-Schuhmann, Dietrich (2015) PhenoMiner: from text to a database of phenotypes associated with OMIM diseases. Database-The Journal of Biological Databases and Curation, 2015 1: 1-12. doi:10.1093/database/bav104

Author Collier, Nigel
Groza, Tudor
Smedley, Damian
Robinson, Peter N.
Oellrich, Anika
Rebholz-Schuhmann, Dietrich
Title PhenoMiner: from text to a database of phenotypes associated with OMIM diseases
Journal name Database-The Journal of Biological Databases and Curation   Check publisher's open access policy
ISSN 1758-0463
Publication date 2015-10-01
Sub-type Article (original research)
DOI 10.1093/database/bav104
Open Access Status DOI
Volume 2015
Issue 1
Start page 1
End page 12
Total pages 12
Place of publication Oxford, United Kingdom
Publisher Oxford University Press
Language eng
Abstract Analysis of scientific and clinical phenotypes reported in the experimental literature has been curated manually to build high-quality databases such as the Online Mendelian Inheritance in Man (OMIM). However, the identification and harmonization of phenotype descriptions struggles with the diversity of human expressivity. We introduce a novel automated extraction approach called PhenoMiner that exploits full parsing and conceptual analysis. Apriori association mining is then used to identify relationships to human diseases. We applied PhenoMiner to the BMC open access collection and identified 13 636 phenotype candidates. We identified 28 155 phenotype-disorder hypotheses covering 4898 phenotypes and 1659 Mendelian disorders. Analysis showed: (i) the semantic distribution of the extracted terms against linked ontologies; (ii) a comparison of term overlap with the Human Phenotype Ontology (HP); (iii) moderate support for phenotype-disorder pairs in both OMIM and the literature; (iv) strong associations of phenotype-disorder pairs to known disease-genes pairs using PhenoDigm. The full list of PhenoMiner phenotypes (S1), phenotype-disorder associations (S2), association-filtered linked data (S3) and user database documentation (S5) is available as supplementary data and can be downloaded at under a Creative Commons Attribution 4.0 license.
Keyword Ontology
Q-Index Code C1
Q-Index Status Confirmed Code
Institutional Status UQ

Document type: Journal Article
Sub-type: Article (original research)
Collections: Official 2016 Collection
School of Information Technology and Electrical Engineering Publications
Version Filter Type
Citation counts: TR Web of Science Citation Count  Cited 4 times in Thomson Reuters Web of Science Article | Citations
Scopus Citation Count Cited 5 times in Scopus Article | Citations
Google Scholar Search Google Scholar
Created: Sun, 22 Nov 2015, 10:19:19 EST by System User on behalf of Scholarly Communication and Digitisation Service