Epigenetic priors for identifying active transcription factor binding sites

Cuellar-Partida, Gabriel, Buske, Fabian A., McLeay, Robert C., Whitington, Tom, Noble, William Stafford and Bailey, Timothy L. (2012) Epigenetic priors for identifying active transcription factor binding sites. Bioinformatics, 28 1: 56-62. doi:10.1093/bioinformatics/btr614


Author Cuellar-Partida, Gabriel
Buske, Fabian A.
McLeay, Robert C.
Whitington, Tom
Noble, William Stafford
Bailey, Timothy L.
Title Epigenetic priors for identifying active transcription factor binding sites
Journal name Bioinformatics   Check publisher's open access policy
ISSN 1367-4803
1367-4811
Publication date 2012-01
Year available 2011
Sub-type Article (original research)
DOI 10.1093/bioinformatics/btr614
Open Access Status Not Open Access
Volume 28
Issue 1
Start page 56
End page 62
Total pages 7
Place of publication Oxford, United Kingdom
Publisher Oxford University Press
Collection year 2012
Language eng
Formatted abstract
Motivation
Accurate knowledge of the genome-wide binding of transcription factors in a particular cell type or under a particular condition is necessary for understanding transcriptional regulation. Using epigenetic data such as histone modification and DNase I, accessibility data has been shown to improve motif-based in silico methods for predicting such binding, but this approach has not yet been fully explored.

Results

We describe a probabilistic method for combining one or more tracks of epigenetic data with a standard DNA sequence motif model to improve our ability to identify active transcription factor binding sites (TFBSs). We convert each data type into a position-specific probabilistic prior and combine these priors with a traditional probabilistic motif model to compute a log-posterior odds score. Our experiments, using histone modifications H3K4me1, H3K4me3, H3K9ac and H3K27ac, as well as DNase I sensitivity, show conclusively that the log-posterior odds score consistently outperforms a simple binary filter based on the same data. We also show that our approach performs competitively with a more complex method, CENTIPEDE, and suggest that the relative simplicity of the log-posterior odds scoring method makes it an appealing and very general method for identifying functional TFBSs on the basis of DNA and epigenetic evidence.

Availability and implementation:
FIMO, part of the MEME Suite software toolkit, now supports log-posterior odds scoring using position-specific priors for motif search. A web server and source code are available at http://meme.nbcr.net. Utilities for creating priors are at http://research.imb.uq.edu.au/t.bailey/SD/Cuellar2011.

Keyword CIS-regulatory modules
I Hypersensitive sites
Human genome
DNASE-I
Q-Index Code C1
Q-Index Status Confirmed Code
Institutional Status UQ

Document type: Journal Article
Sub-type: Article (original research)
Collections: Official 2012 Collection
Institute for Molecular Bioscience - Publications
 
Versions
Version Filter Type
Citation counts: TR Web of Science Citation Count  Cited 22 times in Thomson Reuters Web of Science Article | Citations
Scopus Citation Count Cited 25 times in Scopus Article | Citations
Google Scholar Search Google Scholar
Created: Tue, 24 Jan 2012, 14:42:33 EST by Susan Allen on behalf of Institute for Molecular Bioscience