CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes

Parks, Donovan H., Imelfort, Michael, Skennerton, Connor T., Hugenholtz, Philip and Tyson, Gene W. (2015) CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Research, 25 7: 1043-1055. doi:10.1101/gr.186072.114


 
Related Publications and Datasets
 
Author Parks, Donovan H.
Imelfort, Michael
Skennerton, Connor T.
Hugenholtz, Philip
Tyson, Gene W.
Title CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes
Journal name Genome Research   Check publisher's open access policy
ISSN 1549-5469
1088-9051
Publication date 2015-07-01
Year available 2015
Sub-type Article (original research)
DOI 10.1101/gr.186072.114
Open Access Status DOI
Volume 25
Issue 7
Start page 1043
End page 1055
Total pages 13
Place of publication Cold Spring Harbor, NY United States
Publisher Cold Spring Harbor Laboratory Press
Language eng
Abstract Large-scale recovery of genomes from isolates, single cells, and metagenomic data has been made possible by advances in computational methods and substantial reductions in sequencing costs. Although this increasing breadth of draft genomes is providing key information regarding the evolutionary and functional diversity of microbial life, it has become impractical to finish all available reference genomes. Making robust biological inferences from draft genomes requires accurate estimates of their completeness and contamination. Current methods for assessing genome quality are ad hoc and generally make use of a limited number of “marker” genes conserved across all bacterial or archaeal genomes. Here we introduce CheckM, an automated method for assessing the quality of a genome using a broader set of marker genes specific to the position of a genome within a reference genome tree and information about the collocation of these genes. We demonstrate the effectiveness of CheckM using synthetic data and a wide range of isolate-, single-cell-, and metagenome-derived genomes. CheckM is shown to provide accurate estimates of genome completeness and contamination and to outperform existing approaches. Using CheckM, we identify a diverse range of errors currently impacting publicly available isolate genomes and demonstrate that genomes obtained from single cells and metagenomic data vary substantially in quality. In order to facilitate the use of draft genomes, we propose an objective measure of genome quality that can be used to select genomes suitable for specific gene- and genome-centric analyses of microbial communities.
Q-Index Code C1
Q-Index Status Confirmed Code
Institutional Status UQ

 
Versions
Version Filter Type
Citation counts: TR Web of Science Citation Count  Cited 179 times in Thomson Reuters Web of Science Article | Citations
Scopus Citation Count Cited 248 times in Scopus Article | Citations
Google Scholar Search Google Scholar
Created: Fri, 24 Jul 2015, 19:07:34 EST by Mrs Louise Nimwegen on behalf of Institute for Molecular Bioscience