Use of simulated data sets to evaluate the fidelity of metagenomic processing methods

Mavromatis, Konstantinos, Ivanova, Natalia, Barry, Kerry, Shapiro, Harris, Goltsman, Eugene, McHardy, Alice C., Rigoutsos, Isidore, Salamov, Asaf, Korzeniewski, Frank, Land, Miriam, Lapidus, Alla, Grigoriev, Igor, Richardson, Paul, Hugenholtz, Philip and Kyrpides, Nikos C. (2007) Use of simulated data sets to evaluate the fidelity of metagenomic processing methods. Nature Methods, 4 6: 495-500. doi:10.1038/nmeth1043


Author Mavromatis, Konstantinos
Ivanova, Natalia
Barry, Kerry
Shapiro, Harris
Goltsman, Eugene
McHardy, Alice C.
Rigoutsos, Isidore
Salamov, Asaf
Korzeniewski, Frank
Land, Miriam
Lapidus, Alla
Grigoriev, Igor
Richardson, Paul
Hugenholtz, Philip
Kyrpides, Nikos C.
Title Use of simulated data sets to evaluate the fidelity of metagenomic processing methods
Journal name Nature Methods   Check publisher's open access policy
ISSN 1548-7091
1548-7105
Publication date 2007-06
Sub-type Article (original research)
DOI 10.1038/nmeth1043
Volume 4
Issue 6
Start page 495
End page 500
Total pages 6
Place of publication London, England, U.K.
Publisher Nature Publishing Group
Language eng
Formatted abstract
Metagenomics is a rapidly emerging field of research for studying microbial communities. To evaluate methods presently used to process metagenomic sequences, we constructed three simulated data sets of varying complexity by combining sequencing reads randomly selected from 113 isolate genomes. These data sets were designed to model real metagenomes in terms of complexity and phylogenetic composition. We assembled sampled reads using three commonly used genome assemblers (Phrap, Arachne and JAZZ), and predicted genes using two popular gene-finding pipelines (fgenesb and CRITICA/GLIMMER). The phylogenetic origins of the assembled contigs were predicted using one sequence similarity-based (blast hit distribution) and two sequence composition-based (PhyloPythia, oligonucleotide frequencies) binning methods. We explored the effects of the simulated community structure and method combinations on the fidelity of each processing step by comparison to the corresponding isolate genomes. The simulated data sets are available online to facilitate standardized benchmarking of tools for metagenomic analysis.Please visit methagora to view and post comments on this article.
Keyword Microbial Genomes
Metabolism
Bacterium
Sequences
Q-Index Code C1
Q-Index Status Provisional Code
Institutional Status Non-UQ

Document type: Journal Article
Sub-type: Article (original research)
Collections: ERA 2012 Admin Only
School of Chemistry and Molecular Biosciences
 
Versions
Version Filter Type
Citation counts: TR Web of Science Citation Count  Cited 180 times in Thomson Reuters Web of Science Article | Citations
Scopus Citation Count Cited 201 times in Scopus Article | Citations
Google Scholar Search Google Scholar
Created: Mon, 28 Nov 2011, 09:43:11 EST by Hong Lee on behalf of School of Chemistry & Molecular Biosciences