Assessing phylogenetic motif models for predicting transcription factor binding sites

Hawkins, John, Grant, Charles, Noble, William Stafford and Bailey, Timothy L. (2009). Assessing phylogenetic motif models for predicting transcription factor binding sites. In: Dan Gusfield and Anna Tramontano, ISMB/ECCB 2009 Conference Proceedings. ISMB/ECCB 2009: Joint 17th Annual International Conference on Intelligent Systems for Molecular Biology (ISMB) & 8th European Conference on Computational Biology (ECCB), Stockholm, Sweden, (i339-i347). 27 June-2 July, 2009. doi:10.1093/bioinformatics/btp201

Attached Files (Some files may be inaccessible until you login with your UQ eSpace credentials)
Name Description MIMEType Size Downloads

Author Hawkins, John
Grant, Charles
Noble, William Stafford
Bailey, Timothy L.
Title of paper Assessing phylogenetic motif models for predicting transcription factor binding sites
Conference name ISMB/ECCB 2009: Joint 17th Annual International Conference on Intelligent Systems for Molecular Biology (ISMB) & 8th European Conference on Computational Biology (ECCB)
Conference location Stockholm, Sweden
Conference dates 27 June-2 July, 2009
Proceedings title ISMB/ECCB 2009 Conference Proceedings   Check publisher's open access policy
Journal name Bioinformatics   Check publisher's open access policy
Place of Publication Oxford, United Kingdom
Publisher Oxford University Press
Publication Year 2009
Sub-type Fully published paper
DOI 10.1093/bioinformatics/btp201
Open Access Status
ISSN 1367-4803
1367-4811
Editor Dan Gusfield
Anna Tramontano
Volume 25
Issue 12
Start page i339
End page i347
Total pages 9
Collection year 2010
Language eng
Formatted Abstract/Summary
Motivation: A variety of algorithms have been developed to predict transcription factor binding sites (TFBSs) within the genome by exploiting the evolutionary information implicit in multiple alignments of the genomes of related species. One such approach uses an extension of the standard position-specific motif model that incorporates phylogenetic information via a phylogenetic tree and a model of evolution. However, these phylogenetic motif models (PMMs) have never been rigorously benchmarked in order to determine whether they lead to better prediction of TFBSs than obtained using simple position weight matrix scanning.

Results: We evaluate three PMM-based prediction algorithms, each of which uses a different treatment of gapped alignments, and we compare their prediction accuracy with that of a non-phylogenetic motif scanning approach. Surprisingly, all of these algorithms appear to be inferior to simple motif scanning, when accuracy is measured using a gold standard of validated yeast TFBSs. However, the PMM scanners perform much better than simple motif scanning when we abandon the gold standard and consider the number of statistically significant sites predicted, using column-shuffled ‘random’ motifs to measure significance. These results suggest that the common practice of measuring the accuracy of binding site predictors using collections of known sites may be dangerously misleading since such collections may be missing ‘weak’ sites, which are exactly the type of sites needed to discriminate among predictors. We then extend our previous theoretical model of the statistical power of PMM-based prediction algorithms to allow for loss of binding sites during evolution, and show that it gives a more accurate upper bound on scanner accuracy. Finally, utilizing our theoretical model, we introduce a new method for predicting the number of real binding sites in a genome. The results suggest that the number of true sites for a yeast TF is in general several times greater than the number of known sites listed in the Saccharomyces cerevisiae Database (SCPD). Among the three scanning algorithms that we test, the MONKEY algorithm has the highest accuracy for predicting yeast TFBSs.
Q-Index Code C1
Q-Index Status Confirmed Code
Institutional Status UQ

 
Versions
Version Filter Type
Citation counts: TR Web of Science Citation Count  Cited 9 times in Thomson Reuters Web of Science Article | Citations
Scopus Citation Count Cited 14 times in Scopus Article | Citations
Google Scholar Search Google Scholar
Created: Thu, 11 Feb 2010, 15:20:19 EST by Susan Allen on behalf of Institute for Molecular Bioscience