COPE: an accurate k-mer-based pair-end reads connection tool to facilitate genome assembly

Liu, Binghang, Yuan, Jianying, Yiu, Siu-Ming, Li, Zhenyu, Xie, Yinlong, Chen, Yanxiang, Shi, Yujian, Zhang, Hao, Li, Yingrui, Lam, Tak-Wah and Luo, Ruibang (2012) COPE: an accurate k-mer-based pair-end reads connection tool to facilitate genome assembly. Bioinformatics, 28 22: 2870-2874. doi:10.1093/bioinformatics/bts563


Author Liu, Binghang
Yuan, Jianying
Yiu, Siu-Ming
Li, Zhenyu
Xie, Yinlong
Chen, Yanxiang
Shi, Yujian
Zhang, Hao
Li, Yingrui
Lam, Tak-Wah
Luo, Ruibang
Title COPE: an accurate k-mer-based pair-end reads connection tool to facilitate genome assembly
Formatted title
COPE: an accurate k-mer-based pair-end reads connection tool to facilitate genome assembly
Journal name Bioinformatics   Check publisher's open access policy
ISSN 1367-4803
1367-4811
Publication date 2012-11-15
Year available 2012
Sub-type Article (original research)
DOI 10.1093/bioinformatics/bts563
Open Access Status Not yet assessed
Volume 28
Issue 22
Start page 2870
End page 2874
Total pages 5
Place of publication Oxford, United Kingdom
Publisher Oxford University Press
Language eng
Formatted abstract
Motivation: The boost of next-generation sequencing technologies provides us with an unprecedented opportunity for elucidating genetic mysteries, yet the short-read length hinders us from better assembling the genome from scratch. New protocols now exist that can generate overlapping pair-end reads. By joining the 3′ ends of each read pair, one is able to construct longer reads for assembling. However, effectively joining two overlapped pair-end reads remains a challenging task.

Result: In this article, we present an efficient tool called Connecting Overlapped Pair-End (COPE) reads, to connect overlapping pair-end reads using k-mer frequencies. We evaluated our tool on 30× simulated pair-end reads from Arabidopsis thaliana with 1% base error. COPE connected over 99% of reads with 98.8% accuracy, which is, respectively, 10 and 2% higher than the recently published tool FLASH. When COPE is applied to real reads for genome assembly, the resulting contigs are found to have fewer errors and give a 14-fold improvement in the N50 measurement when compared with the contigs produced using unconnected reads.
Q-Index Code C1
Q-Index Status Provisional Code
Institutional Status Non-UQ

Document type: Journal Article
Sub-type: Article (original research)
Collection: School of Chemistry and Molecular Biosciences
 
Versions
Version Filter Type
Citation counts: TR Web of Science Citation Count  Cited 48 times in Thomson Reuters Web of Science Article | Citations
Scopus Citation Count Cited 51 times in Scopus Article | Citations
Google Scholar Search Google Scholar
Created: Tue, 01 Sep 2015, 22:50:12 EST by System User on behalf of Learning and Research Services (UQ Library)