The statistical power of phylogenetic motif models

Hawkins, John and Bailey, Timothy L. (2008). The statistical power of phylogenetic motif models. In: Martin Vingron and Limsoon Wong, Lecture Notes in Computer Science: Research in Computational Molecular Biology 12th Annual International Conference,RECOMB2008 Singapore, March 30 – April 2, 2008 Proceeding. 12th Annual International Conference, RECOMB 2008, Singapore, (112-126). 30 March – 2 April 2008. doi:10.1007/978-3-540-78839-3_10

Attached Files (Some files may be inaccessible until you login with your UQ eSpace credentials)
Name Description MIMEType Size Downloads

Author Hawkins, John
Bailey, Timothy L.
Title of paper The statistical power of phylogenetic motif models
Conference name 12th Annual International Conference, RECOMB 2008
Conference location Singapore
Conference dates 30 March – 2 April 2008
Proceedings title Lecture Notes in Computer Science: Research in Computational Molecular Biology 12th Annual International Conference,RECOMB2008 Singapore, March 30 – April 2, 2008 Proceeding   Check publisher's open access policy
Journal name Research in Computational Molecular Biology, Proceedings   Check publisher's open access policy
Place of Publication Berlin, Germany
Publisher Springer
Publication Year 2008
Sub-type Fully published paper
DOI 10.1007/978-3-540-78839-3_10
Open Access Status Not yet assessed
ISBN 978-3-540-78838-6
ISSN 0302-9743
Editor Martin Vingron
Limsoon Wong
Volume 4955
Start page 112
End page 126
Total pages 15
Language eng
Abstract/Summary One component of the genomic program controlling the transcriptional regulation of genes are the locations and arrangement of transcription factors bound to the promoter and enhancer regions of a gene. Because the genomic locations of the functional binding sites of most transcription factors is not yet known, predicting them is of great importance. Unfortunately, it is well known that the low specificity of the binding of transcription factors to DNA makes such prediction, using position-specific probability matrices (motifs) alone, subject to huge numbers of false positives. One approach to alleviating this problem has been to use phylogenetic “shadowing” or “footprinting” to remove unconserved regions of the genome from consideration. Another approach has been to combine a phylogenetic model and the site-specificity model into a single, predictive model of conserved binding sites. Both of these approaches are based on alignments of orthologous genomic regions from two or more species. In this work, we use a simplified, theoretical model to study the statistical power of the later approach to the prediction of features such as transcription factor binding sites. We investigate the question of the number of genomes required at varying evolutionary distances to achieve specified levels of accuracy (false positive and false negative prediction rates). We show that this depends strongly on the information content of the position-specific probability matrix and on the evolutionary model. We explore the effects of modifying the structure of the phylogenetic model, and conclude that placing the target genome at the root of the tree has a negligible effect on the power predicted by the model. Hence, as it is much easier to calculate, we can use this as an approximation to phylogenetic motif scanning using real trees. Finally we perform an empirical study and demonstrate that the performance of current phylogenetic motif scanning programs is far from the theoretical limit of their power, leaving ample room for improvement.
Subjects C1
060405 Gene Expression (incl. Microarray and other genome-wide approaches)
970104 Expanding Knowledge in the Earth Sciences
Keyword genomic program
transcriptional regulation
phylogenetic model
Q-Index Code C1
Q-Index Status Confirmed Code
Institutional Status UQ

Version Filter Type
Citation counts: TR Web of Science Citation Count  Cited 3 times in Thomson Reuters Web of Science Article | Citations
Scopus Citation Count Cited 4 times in Scopus Article | Citations
Google Scholar Search Google Scholar
Created: Mon, 06 Apr 2009, 21:44:33 EST by Cody Mudgway on behalf of Institute for Molecular Bioscience