Optimizing de novo transcriptome assembly from short-read RNA-Seq data: a comparative study

Zhao, Qiong-Yi, Wang, Yi, Kong, Yi-Meng, Luo, Da, Li, Xuan and Hao, Pei (2011) Optimizing de novo transcriptome assembly from short-read RNA-Seq data: a comparative study. BMC Bioinformatics, 12 Suppl. 14: 1-12. doi:10.1186/1471-2105-12-S14-S2


Author Zhao, Qiong-Yi
Wang, Yi
Kong, Yi-Meng
Luo, Da
Li, Xuan
Hao, Pei
Title Optimizing de novo transcriptome assembly from short-read RNA-Seq data: a comparative study
Formatted title
Optimizing de novo transcriptome assembly from short-read RNA-Seq data: a comparative study
Journal name BMC Bioinformatics   Check publisher's open access policy
ISSN 1471-2105
Publication date 2011-12-14
Sub-type Article (original research)
DOI 10.1186/1471-2105-12-S14-S2
Open Access Status DOI
Volume 12
Issue Suppl. 14
Start page 1
End page 12
Total pages 12
Place of publication London, United Kingdom
Publisher BioMed Central
Language eng
Formatted abstract
Background: With the fast advances in nextgen sequencing technology, high-throughput RNA sequencing has emerged as a powerful and cost-effective way for transcriptome study. De novo assembly of transcripts provides an important solution to transcriptome analysis for organisms with no reference genome. However, there lacked understanding on how the different variables affected assembly outcomes, and there was no consensus on how to approach an optimal solution by selecting software tool and suitable strategy based on the properties of RNA-Seq data.
Results: To reveal the performance of different programs for transcriptome assembly, this work analyzed some important factors, including k-mer values, genome complexity, coverage depth, directional reads, etc. Seven program conditions, four single k-mer assemblers (SK: SOAPdenovo, ABySS, Oases and Trinity) and three multiple k-mer methods (MK: SOAPdenovo-MK, trans-ABySS and Oases-MK) were tested. While small and large k-mer values performed better for reconstructing lowly and highly expressed transcripts, respectively, MK strategy worked well for almost all ranges of expression quintiles. Among SK tools, Trinity performed well across various conditions but took the longest running time. Oases consumed the most memory whereas SOAPdenovo required the shortest runtime but worked poorly to reconstruct full-length CDS. ABySS showed some good balance between resource usage and quality of assemblies.
Conclusions: Our work compared the performance of publicly available transcriptome assemblers, and analyzed important factors affecting de novo assembly. Some practical guidelines for transcript reconstruction from short-read RNA-Seq data were proposed. De novo assembly of C. sinensis transcriptome was greatly improved using some optimized methods.

Keyword Alignment
Ultrafast
Resource
Cancer
Q-Index Code C1
Q-Index Status Provisional Code
Institutional Status Non-UQ
Additional Notes From 22nd International Conference on Genome Informatics Busan, Korea. 5-7 December 2011

Document type: Journal Article
Sub-type: Article (original research)
Collection: Queensland Brain Institute Publications
 
Versions
Version Filter Type
Citation counts: TR Web of Science Citation Count  Cited 129 times in Thomson Reuters Web of Science Article | Citations
Scopus Citation Count Cited 145 times in Scopus Article | Citations
Google Scholar Search Google Scholar
Created: Sat, 14 Jun 2014, 05:07:32 EST by System User on behalf of Queensland Brain Institute