Tissue-specific prediction of directly regulated genes

McLeay, Robert C., Leat, Chris J. and Bailey, Timothy L. (2011) Tissue-specific prediction of directly regulated genes. Bioinformatics, 27 17: 2354-2360. doi:10.1093/bioinformatics/btr399

Author McLeay, Robert C.
Leat, Chris J.
Bailey, Timothy L.
Title Tissue-specific prediction of directly regulated genes
Journal name Bioinformatics   Check publisher's open access policy
ISSN 1367-4803
Publication date 2011-09
Sub-type Article (original research)
DOI 10.1093/bioinformatics/btr399
Volume 27
Issue 17
Start page 2354
End page 2360
Total pages 7
Editor Alfonso Valencia
Place of publication Oxford, U.K.
Publisher Oxford University Press
Collection year 2012
Language eng
Formatted abstract
Direct binding by a transcription factor (TF) to the proximal promoter of a gene is a strong evidence that the TF regulates the gene. Assaying the genome-wide binding of every TF in every cell type and condition is currently impractical. Histone modifications correlate with tissue/cell/condition-specific (‘tissue specific’) TF binding, so histone ChIP-seq data can be combined with traditional position weight matrix (PWM) methods to make tissue-specific predictions of TF–promoter interactions.

Results: We use supervised learning to train a naïve Bayes predictor of TF–promoter binding. The predictor's features are the histone modification levels and a PWM-based score for the promoter. Training and testing uses sets of promoters labeled using TF ChIP-seq data, and we use cross-validation on 23 such datasets to measure the accuracy. A PWM+histone naïve Bayes predictor using a single histone modification (H3K4me3) is substantially more accurate than a PWM score or a conservation-based score (phylogenetic motif model). The naïve Bayes predictor is more accurate (on average) at all sensitivity levels, and makes only half as many false positive predictions at sensitivity levels from 10% to 80%. On average, it correctly predicts 80% of bound promoters at a false positive rate of 20%. Accuracy does not diminish when we test the predictor in a different cell type (and species) from training. Accuracy is barely diminished even when we train the predictor without using TF ChIP-seq data.

Availability: Our tissue-specific predictor of promoters bound by a TF is called DR GENE and is available at http://bioinformatics.org.au/drgene.
Keyword Factor-binding-sites
Human genome
Q-Index Code C1
Q-Index Status Confirmed Code
Institutional Status UQ

Document type: Journal Article
Sub-type: Article (original research)
Collections: Official 2012 Collection
Institute for Molecular Bioscience - Publications
Version Filter Type
Citation counts: TR Web of Science Citation Count  Cited 3 times in Thomson Reuters Web of Science Article | Citations
Scopus Citation Count Cited 3 times in Scopus Article | Citations
Google Scholar Search Google Scholar
Created: Sun, 11 Sep 2011, 21:27:55 EST by System User on behalf of Institute for Molecular Bioscience