Discrimination of Non-Protein-Coding Transcripts from Protein-Coding mRNA

Frith, M. C., Bailey, T. L., Kasukawa, T., Mignone, F., Kummerfeld, S. K., Medera, M., Sunkara, S., Furuno, M., Bult, C. J., Quackenbush, J., Kai, C., Kawai, J., Carninci, P., Hayashizaki, Y, Pesole, G. and Mattick, J. S. (2006) Discrimination of Non-Protein-Coding Transcripts from Protein-Coding mRNA. RNA Biology, 3 1: 40-48.


Author Frith, M. C.
Bailey, T. L.
Kasukawa, T.
Mignone, F.
Kummerfeld, S. K.
Medera, M.
Sunkara, S.
Furuno, M.
Bult, C. J.
Quackenbush, J.
Kai, C.
Kawai, J.
Carninci, P.
Hayashizaki, Y
Pesole, G.
Mattick, J. S.
Title Discrimination of Non-Protein-Coding Transcripts from Protein-Coding mRNA
Journal name RNA Biology   Check publisher's open access policy
ISSN 1547-6286; 1555-8284
Publication date 2006-03
Year available 2006
Sub-type Article (original research)
Volume 3
Issue 1
Start page 40
End page 48
Total pages 9
Place of publication Georgetown, TX, USA
Publisher Landes Bioscience
Collection year 2006
Language eng
Subject 060407 Genome Structure and Regulation
Abstract Several recent studies indicate that mammals and other organisms produce large numbers of RNA transcripts that do not correspond to known genes. It has been suggested that these transcripts do not encode proteins, but may instead function as RNAs. However, discrimination of coding and noncoding transcripts is not straightforward, and different laboratories have used different methods, whose ability to perform this discrimination is unclear. In this study, we examine ten bioinformatic methods that assess protein-coding potential and compare their ability and congruency in the discrimination of noncoding from coding sequences, based on four underlying principles: open reading frame size, sequence similarity to known proteins or protein domains, statistical models of protein-coding sequence, and synonymous versus nonsynonymous substitution rates. Despite these different approaches, the methods show broad concordance, suggesting that coding and noncoding transcripts can, in general, be reliably discriminated, and that many of the recently discovered extra-genic transcripts are indeed noncoding. Comparison of the methods indicates reasons for unreliable predictions, and approaches to increase confidence further. Conversely and surprisingly, our analyses also provide evidence that as much as ~10% of entries in the manually curated protein database Swiss-Prot are erroneous translations of actually noncoding transcripts.
Keyword transcriptome
Proteome
ncRNA
mRNA
bioinformatics
Q-Index Code C1

 
Versions
Version Filter Type
Citation counts: Scopus Citation Count Cited 0 times in Scopus Article
Google Scholar Search Google Scholar
Created: Tue, 20 Jan 2009, 16:51:57 EST by Maryanne Watson on behalf of Library Corporate Services