Evaluation of viral genome assembly and diversity estimation in deep metagenomes

Aguirre de Carcer, Daniel, Angly, Florent E. and Alcami, Antonio (2014) Evaluation of viral genome assembly and diversity estimation in deep metagenomes. BMC Genomics, 15 1: . doi:10.1186/1471-2164-15-989

Author Aguirre de Carcer, Daniel
Angly, Florent E.
Alcami, Antonio
Title Evaluation of viral genome assembly and diversity estimation in deep metagenomes
Journal name BMC Genomics   Check publisher's open access policy
ISSN 1471-2164
Publication date 2014-11-01
Year available 2014
Sub-type Article (original research)
DOI 10.1186/1471-2164-15-989
Open Access Status DOI
Volume 15
Issue 1
Total pages 11
Place of publication London, United Kingdom
Publisher BioMed Central
Language eng
Formatted abstract
Viruses have unique properties, small genome and regions of high similarity, whose effects on metagenomic assemblies have not been characterized so far. This study uses diverse in silico simulated viromes to evaluate how extensively genomes can be assembled using different sequencing platforms and assemblers. Further, it investigates the suitability of different methods to estimate viral diversity in metagenomes.

We created in silico metagenomes mimicking various platforms at different sequencing depths. The CLC assembler revealed subpar compared to IDBA_UD and CAMERA , which are metagenomic-specific. Up to a saturation point, Illumina platforms proved more capable of reconstructing large portions of viral genomes compared to 454. Read length was an important factor for limiting chimericity, while scaffolding marginally improved contig length and accuracy. The genome length of the various viruses in the metagenomes did not significantly affect genome reconstruction, but the co-existence of highly similar genomes was detrimental. When evaluating diversity estimation tools, we found that PHACCS results were more accurate than those from CatchAll and clustering, which were both orders of magnitude above expected.

Assemblers designed specifically for the analysis of metagenomes should be used to facilitate the creation of high-quality long contigs. Despite the high coverage possible, scientists should not expect to always obtain complete genomes, because their reconstruction may be hindered by co-existing species bearing highly similar genomic regions. Further development of metagenomics-oriented assemblers may help bypass these limitations in future studies. Meanwhile, the lack of fully reconstructed communities keeps methods to estimate viral diversity relevant. While none of the three methods tested had absolute precision, only PHACCS was deemed suitable for comparative studies.
Keyword Assembly
Sequencing Data
Q-Index Code C1
Q-Index Status Confirmed Code
Institutional Status UQ

Document type: Journal Article
Sub-type: Article (original research)
Collections: Official 2015 Collection
School of Chemistry and Molecular Biosciences
Version Filter Type
Citation counts: TR Web of Science Citation Count  Cited 1 times in Thomson Reuters Web of Science Article | Citations
Scopus Citation Count Cited 10 times in Scopus Article | Citations
Google Scholar Search Google Scholar
Created: Sun, 18 Jan 2015, 11:17:41 EST by System User on behalf of School of Chemistry & Molecular Biosciences