SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler

Luo, Ruibang, Liu, Binghang, Xie, Yinlong, Li, Zhenyu, Huang, Weihua, Yuan, Jianying, He, Guangzhu, Chen, Yanxiang, Pan, Qi, Liu, Yunjie, Tang, Jingbo, Wu, Gengxiong, Zhang, Hao, Shi, Yujian, Liu, Yong, Yu, Chang, Wang, Bo, Lu, Yao, Han, Changlei, Cheung, David W., Yiu, Siu-Ming, Peng, Shaoliang, Zhu Xiaoqian, Liu, Guangming, Liao, Xiangke, Li, Yingrui, Yang, Huanming, Wang, Jian, Lam, Tak-Wah and Wang, Jun (2012) SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler. Gigascience, 1 18: . doi:10.1186/2047-217X-1-18


Author Luo, Ruibang
Liu, Binghang
Xie, Yinlong
Li, Zhenyu
Huang, Weihua
Yuan, Jianying
He, Guangzhu
Chen, Yanxiang
Pan, Qi
Liu, Yunjie
Tang, Jingbo
Wu, Gengxiong
Zhang, Hao
Shi, Yujian
Liu, Yong
Yu, Chang
Wang, Bo
Lu, Yao
Han, Changlei
Cheung, David W.
Yiu, Siu-Ming
Peng, Shaoliang
Zhu Xiaoqian
Liu, Guangming
Liao, Xiangke
Li, Yingrui
Yang, Huanming
Wang, Jian
Lam, Tak-Wah
Wang, Jun
Title SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler
Journal name Gigascience   Check publisher's open access policy
ISSN 2047-217X
Publication date 2012-12
Year available 2012
Sub-type Article (original research)
DOI 10.1186/2047-217X-1-18
Open Access Status DOI
Volume 1
Issue 18
Total pages 6
Place of publication London, United Kingdom
Publisher BioMed Centra
Language eng
Formatted abstract
Background
There is a rapidly increasing amount of de novo genome assembly using next-generation sequencing (NGS) short reads; however, several big challenges remain to be overcome in order for this to be efficient and accurate. SOAPdenovo has been successfully applied to assemble many published genomes, but it still needs improvement in continuity, accuracy and coverage, especially in repeat regions.

Findings
To overcome these challenges, we have developed its successor, SOAPdenovo2, which has the advantage of a new algorithm design that reduces memory consumption in graph construction, resolves more repeat regions in contig assembly, increases coverage and length in scaffold construction, improves gap closing, and optimizes for large genome.

Conclusions
Benchmark using the Assemblathon1 and GAGE datasets showed that SOAPdenovo2 greatly surpasses its predecessor SOAPdenovo and is competitive to other assemblers on both assembly length and accuracy. We also provide an updated assembly version of the 2008 Asian (YH) genome using SOAPdenovo2. Here, the contig and scaffold N50 of the YH genome were ~20.9 kbp and ~22 Mbp, respectively, which is 3-fold and 50-fold longer than the first published version. The genome coverage increased from 81.16% to 93.91%, and memory consumption was ~2/3 lower during the point of largest memory consumption.
Keyword Genome
Assembly
Contig
Scaffold
Error correction
Gap filling
Q-Index Code C1
Q-Index Status Provisional Code
Institutional Status Non-UQ

Document type: Journal Article
Sub-type: Article (original research)
Collection: Institute for Molecular Bioscience - Publications
 
Available Versions of this Record
 
Versions
Version Filter Type
Citation counts: TR Web of Science Citation Count  Cited 538 times in Thomson Reuters Web of Science Article | Citations
Scopus Citation Count Cited 588 times in Scopus Article | Citations
Google Scholar Search Google Scholar
Created: Tue, 01 Sep 2015, 12:49:35 EST by System User on behalf of Institute for Molecular Bioscience