Motif Enrichment Analysis: A unified framework and an evaluation on ChIP data

McLeay, Robert C. and Bailey, Timothy L. (2010) Motif Enrichment Analysis: A unified framework and an evaluation on ChIP data. BMC Bioinformatics, 11 165-1-165-11. doi:10.1186/1471-2105-11-165

Attached Files (Some files may be inaccessible until you login with your UQ eSpace credentials)
Name Description MIMEType Size Downloads
UQ205917.pdf HERDC combined – not publicly available application/pdf 48.15KB 1

Author McLeay, Robert C.
Bailey, Timothy L.
Title Motif Enrichment Analysis: A unified framework and an evaluation on ChIP data
Journal name BMC Bioinformatics   Check publisher's open access policy
ISSN 1471-2105
Publication date 2010-04
Sub-type Article (original research)
DOI 10.1186/1471-2105-11-165
Open Access Status DOI
Volume 11
Start page 165-1
End page 165-11
Total pages 11
Editor Melissa Norton
Place of publication London, U.K.
Publisher BioMed Central
Collection year 2011
Language eng
Formatted abstract
Background: A major goal of molecular biology is determining the mechanisms that control the transcription of genes. Motif Enrichment Analysis (MEA) seeks to determine which DNA-binding transcription factors control the transcription of a set of genes by detecting enrichment of known binding motifs in the genes' regulatory regions. Typically, the biologist specifies a set of genes believed to be co-regulated and a library of known DNA-binding models for transcription factors, and MEA determines which (if any) of the factors may be direct regulators of the genes. Since the number of factors with known DNA-binding models is rapidly increasing as a result of high-throughput technologies, MEA is becoming increasingly useful. In this paper, we explore ways to make MEA applicable in more settings, and evaluate the efficacy of a number of MEA approaches.
Results: We first define a mathematical framework for Motif Enrichment Analysis that relaxes the requirement that the biologist input a selected set of genes. Instead, the input consists of all regulatory regions, each labeled with the level of a biological signal. We then define and implement a number of motif enrichment analysis methods. Some of these methods require a user-specified signal threshold, some identify an optimum threshold in a data-driven way and two of our methods are threshold-free. We evaluate these methods, along with two existing methods (Clover and PASTAA), using yeast ChIP-chip data. Our novel threshold-free method based on linear regression performs best in our evaluation, followed by the data-driven PASTAA algorithm. The Clover algorithm performs as well as PASTAA if the user-specified threshold is chosen optimally. Data-driven methods based on three statistical tests-Fisher Exact Test, rank-sum test, and multi-hypergeometric test-perform poorly, even when the threshold is chosen optimally. These methods (and Clover) perform even worse when unrestricted data-driven threshold determination is used.
Conclusions: Our novel, threshold-free linear regression method works well on ChIP-chip data. Methods using data-driven threshold determination can perform poorly unless the range of thresholds is limited a priori. The limits implemented in PASTAA, however, appear to be well-chosen. Our novel algorithms-AME (Analysis of Motif Enrichment)-are available at http://bioinformatics.org.au/ame/.
© 2010 McLeay and Bailey; licensee BioMed Central Ltd.
Keyword Factor-binding sites
Transcription factors
Over-representation
Regulatory motifs
Coexpressed genes
Target genes
Identification
Genome
Yeast
Expression
Q-Index Code C1
Q-Index Status Confirmed Code
Institutional Status UQ
Additional Notes Article number 165, pp.1-11

Document type: Journal Article
Sub-type: Article (original research)
Collections: Official 2011 Collection
ERA 2012 Admin Only
Institute for Molecular Bioscience - Publications
 
Versions
Version Filter Type
Citation counts: TR Web of Science Citation Count  Cited 40 times in Thomson Reuters Web of Science Article | Citations
Scopus Citation Count Cited 48 times in Scopus Article | Citations
Google Scholar Search Google Scholar
Created: Sun, 06 Jun 2010, 00:07:29 EST