Automatic, context-specific generation of Gene Ontology slims

Davis, Melissa J., Sehgal, Muhammad Shoaib B. and Ragan, Mark A. (2010) Automatic, context-specific generation of Gene Ontology slims. BMC Bioinformatics, 11 498-1-498-13. doi:10.1186/1471-2105-11-498

Author Davis, Melissa J.
Sehgal, Muhammad Shoaib B.
Ragan, Mark A.
Title Automatic, context-specific generation of Gene Ontology slims
Journal name BMC Bioinformatics   Check publisher's open access policy
ISSN 1471-2105
Publication date 2010-10-07
Sub-type Article (original research)
DOI 10.1186/1471-2105-11-498
Open Access Status DOI
Volume 11
Start page 498-1
End page 498-13
Total pages 13
Editor Melissa Norton
Place of publication London, U.K.
Publisher BioMed Central
Collection year 2011
Language eng
Formatted abstract
Background: The use of ontologies to control vocabulary and structure annotation has added value to genome-scale data, and contributed to the capture and re-use of knowledge across research domains. Gene Ontology (GO) is widely used to capture detailed expert knowledge in genomic-scale datasets and as a consequence has grown to contain many terms, making it unwieldy for many applications. To increase its ease of manipulation and efficiency of use, subsets called GO slims are often created by collapsing terms upward into more general, high-level terms relevant to a particular context. Creation of a GO slim currently requires manipulation and editing of GO by an expert (or community) familiar with both the ontology and the biological context. Decisions about which terms to include are necessarily subjective, and the creation process itself and subsequent curation are time-consuming and largely manual.

Results: Here we present an objective framework for generating customised ontology slims for specific annotated datasets, exploiting information latent in the structure of the ontology graph and in the annotation data. This framework combines ontology engineering approaches, and a data-driven algorithm that draws on graph and information theory. We illustrate this method by application to GO, generating GO slims at different information thresholds, characterising their depth of semantics and demonstrating the resulting gains in statistical power.

Conclusions: Our GO slim creation pipeline is available for use in conjunction with any GO-annotated dataset, and creates dataset-specific, objectively defined slims. This method is fast and scalable for application to other biomedical ontologies.
© 2010 Davis et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Keyword Semantic similarity
Budding yeast
Gene Ontology (GO)
Q-Index Code C1
Q-Index Status Confirmed Code
Institutional Status UQ
Additional Notes Article number 498

Document type: Journal Article
Sub-type: Article (original research)
Collections: Official 2011 Collection
Institute for Molecular Bioscience - Publications
Version Filter Type
Citation counts: TR Web of Science Citation Count  Cited 14 times in Thomson Reuters Web of Science Article | Citations
Scopus Citation Count Cited 15 times in Scopus Article | Citations
Google Scholar Search Google Scholar
Created: Tue, 01 Feb 2011, 10:27:47 EST by Susan Allen on behalf of Institute for Molecular Bioscience