Exploiting structural and topological information to improve prediction of RNA-protein binding sites

Maetschke, Stefan R. and Yuan, Zheng (2009) Exploiting structural and topological information to improve prediction of RNA-protein binding sites. BMC Bioinformatics, 10 341.1-341.14. doi:10.1186/1471-2105-10-341

Attached Files (Some files may be inaccessible until you login with your UQ eSpace credentials)
Name Description MIMEType Size Downloads

Author Maetschke, Stefan R.
Yuan, Zheng
Title Exploiting structural and topological information to improve prediction of RNA-protein binding sites
Journal name BMC Bioinformatics   Check publisher's open access policy
ISSN 1471-2105
Publication date 2009-10-18
Sub-type Article (original research)
DOI 10.1186/1471-2105-10-341
Open Access Status DOI
Volume 10
Start page 341.1
End page 341.14
Total pages 14
Editor Melissa Norton
Place of publication London, England, U.K.
Publisher BioMed Central
Collection year 2010
Language eng
Subject C1
970106 Expanding Knowledge in the Biological Sciences
060102 Bioinformatics
Formatted abstract
Background
RNA-protein interactions are important for a wide range of biological processes. Current computational methods to predict interacting residues in RNA-protein interfaces predominately rely on sequence data. It is, however, known that interface residue propensity is closely correlated with structural properties. In this paper we systematically study information obtained from sequences and structures and compare their contributions in this prediction problem. Particularly, different geometrical and network topological properties of protein structures are evaluated to improve interface residue prediction accuracy.
Results We have quantified the impact of structural information on the prediction accuracy in comparison to the purely sequence based approach using two machine learning techniques: Naïve Bayes classifiers and Support Vector Machines. The highest AUC of 0.83 was achieved by a Support Vector Machine, exploiting PSI-BLAST profile, accessible surface area, betweenness-centrality and retention coefficient as input features. Taking into account that our results are based on a larger non-redundant data set, the prediction accuracy is considerably higher than reported in previous, comparable studies. A protein-RNA interface predictor (PRIP) and the data set have been made available at http://www.qfab.org/PRIP.
Conclusion Graph-theoretic properties of residue contact maps derived from protein structures such as betweenness-centrality can supplement sequence or structure features to improve the prediction accuracy for binding residues in RNA-protein interactions. While Support Vector Machines perform better on this task, Naïve Bayes classifiers also have been found to achieve good prediction accuracies but require much less training time and are an attractive choice for large scale predictions.
Keyword Residues
Network
Recognition
Algorithms
Sequence
Database
DNA
Q-Index Code C1
Q-Index Status Confirmed Code
Institutional Status UQ
Additional Notes Article number 341

Document type: Journal Article
Sub-type: Article (original research)
Collections: 2010 Higher Education Research Data Collection
ERA 2012 Admin Only
Institute for Molecular Bioscience - Publications
 
Versions
Version Filter Type
Citation counts: TR Web of Science Citation Count  Cited 27 times in Thomson Reuters Web of Science Article | Citations
Scopus Citation Count Cited 36 times in Scopus Article | Citations
Google Scholar Search Google Scholar
Created: Sun, 29 Nov 2009, 00:05:31 EST