Annotation of metagenome short reads using proxygenes

Dalevi, Daneil, Ivanova, Natalia N., Mavromatis, Konstantinos, Hooper, Sean D., Szeto, Ernest, Hugenholtz, Philip, Kyrpides, Nikos C. and Markowitz, Victor M. (2008). Annotation of metagenome short reads using proxygenes. In: Anna Tramontano, ECCB 2008 Conference proceedings. Joint Meeting of the 7th European Conference on Computational Biology and the 5th Meeting of the Bioinformatics Italian Society (ECCB 2008), Cagliari, Italy, (I7-I13). 22-26 September 2008. doi:10.1093/bioinformatics/btn276


Author Dalevi, Daneil
Ivanova, Natalia N.
Mavromatis, Konstantinos
Hooper, Sean D.
Szeto, Ernest
Hugenholtz, Philip
Kyrpides, Nikos C.
Markowitz, Victor M.
Title of paper Annotation of metagenome short reads using proxygenes
Conference name Joint Meeting of the 7th European Conference on Computational Biology and the 5th Meeting of the Bioinformatics Italian Society (ECCB 2008)
Conference location Cagliari, Italy
Conference dates 22-26 September 2008
Proceedings title ECCB 2008 Conference proceedings   Check publisher's open access policy
Journal name Bioinformatics   Check publisher's open access policy
Place of Publication Oxford, U.K.
Publisher Oxford University Press
Publication Year 2008
Sub-type Fully published paper
DOI 10.1093/bioinformatics/btn276
ISSN 1367-4803
1367-4811
Editor Anna Tramontano
Volume 24
Issue 16
Start page I7
End page I13
Total pages 7
Language eng
Formatted Abstract/Summary Motivation: A typical metagenome dataset generated using a 454 pyrosequencing platform consists of short reads sampled from the collective genome of a microbial community. The amount of sequence in such datasets is usually insufficient for assembly, and traditional gene prediction cannot be applied to unassembled short reads. As a result, analysis of such datasets usually involves comparisons in terms of relative abundances of various protein families. The latter requires assignment of individual reads to protein families, which is hindered by the fact that short reads contain only a fragment, usually small, of a protein.

Results: We have considered the assignment of pyrosequencing reads to protein families directly using RPS-BLAST against COG and Pfam databases and indirectly via proxygenes that are identified using BLASTx searches against protein sequence databases. Using simulated metagenome datasets as benchmarks, we show that the proxygene method is more accurate than the direct assignment. We introduce a clustering method which significantly reduces the size of a metagenome dataset while maintaining a faithful representation of its functional and taxonomic content.
Keyword Genome annotation
Protein families
Database
Q-Index Code E1
Q-Index Status Provisional Code
Institutional Status Non-UQ
Additional Notes This special issue of Bioinformatics contains the proceedings of the joint meeting of the 7th European Conference on Computational Biology (ECCB) and the 5th Meeting of the Bioinformatics Italian Society (BITS). The conference was held from September 22 to 26 in the Congress Center of Cagliari, Italy.

Document type: Conference Paper
Collections: ERA 2012 Admin Only
School of Chemistry and Molecular Biosciences
 
Versions
Version Filter Type
Citation counts: TR Web of Science Citation Count  Cited 14 times in Thomson Reuters Web of Science Article | Citations
Scopus Citation Count Cited 24 times in Scopus Article | Citations
Google Scholar Search Google Scholar
Access Statistics: 44 Abstract Views  -  Detailed Statistics
Created: Thu, 01 Dec 2011, 12:52:13 EST by System User on behalf of School of Chemistry & Molecular Biosciences