Exploiting sequence dependencies in the prediction of peroxisomal proteins

Wakabayashi, M., Hawkins, J. C., Maetschke, S. R. and Boden, M. B. (2005). Exploiting sequence dependencies in the prediction of peroxisomal proteins. In: M. Gallagher, J. Hogan and F. Maire, Intelligent Data Engineering and Automated Learning - IDEAL2005. Intelligent Data Engineering and Automated Learning - IDEAL2005, Brisbane, Australia, (454-461). 6-8 July 2005.

Author Wakabayashi, M.
Hawkins, J. C.
Maetschke, S. R.
Boden, M. B.
Title of paper Exploiting sequence dependencies in the prediction of peroxisomal proteins
Conference name Intelligent Data Engineering and Automated Learning - IDEAL2005
Conference location Brisbane, Australia
Conference dates 6-8 July 2005
Proceedings title Intelligent Data Engineering and Automated Learning - IDEAL2005   Check publisher's open access policy
Journal name Intelligent Data Engineering and Automated Learning Ideal 2005, Proceedings   Check publisher's open access policy
Place of Publication Berlin, Germany
Publisher Springer-Verlag
Publication Year 2005
Sub-type Fully published paper
ISBN 9783540269724
ISSN 0302-9743
Editor M. Gallagher
J. Hogan
F. Maire
Volume 3578
Start page 454
End page 461
Total pages 8
Collection year 2005
Language eng
Abstract/Summary Prediction of peroxisomal matrix proteins generally depends on the presence of one of two distinct motifs at the end of the amino acid sequence. PTS1 peroxisomal proteins have a well conserved tripeptide at the C-terminal end. However, the preceding residues in the sequence arguably play a crucial role in targeting the protein to the peroxisome. Previous work in applying machine learning to the prediction of peroxisomal matrix proteins has failed W capitalize on the full extent of these dependencies. We benchmark a range of machine learning algorithms, and show that a classifier - based on the Support Vector Machine - produces more accurate results when dependencies between the conserved motif and the preceding section are exploited. We publish an updated and rigorously curated data set that results in increased prediction accuracy of most tested models.
Subjects E1
280207 Pattern Recognition
780101 Mathematical sciences
Q-Index Code E1
Q-Index Status Provisional Code
Institutional Status UQ

 
Versions
Version Filter Type
Citation counts: TR Web of Science Citation Count  Cited 3 times in Thomson Reuters Web of Science Article | Citations
Scopus Citation Count Cited 0 times in Scopus Article
Google Scholar Search Google Scholar
Created: Thu, 23 Aug 2007, 21:02:32 EST