Recapitulating phylogenies using k-mers: from trees to networks [version 2; referees: 2 approved]

Bernard, Guillaume, Ragan, Mark A. and Chan, Cheong Xin (2016) Recapitulating phylogenies using k-mers: from trees to networks [version 2; referees: 2 approved]. F1000Research, 5 2789: . doi:10.12688/f1000research.10225.2

Attached Files (Some files may be inaccessible until you login with your UQ eSpace credentials)
Name Description MIMEType Size Downloads
Bernard2016_F1000Research.pdf Full text (open access) application/pdf 2.74MB 0

Author Bernard, Guillaume
Ragan, Mark A.
Chan, Cheong Xin
Title Recapitulating phylogenies using k-mers: from trees to networks [version 2; referees: 2 approved]
Formatted title
Recapitulating phylogenies using k-mers: from trees to networks [version 2; referees: 2 approved]
Journal name F1000Research   Check publisher's open access policy
ISSN 2046-1402
Publication date 2016-12-23
Sub-type Article (original research)
DOI 10.12688/f1000research.10225.2
Open Access Status DOI
Volume 5
Issue 2789
Total pages 12
Place of publication London, United Kingdom
Publisher Faculty of 1000
Collection year 2017
Language eng
Formatted abstract
Ernst Haeckel based his landmark Tree of Life on the supposed ontogenic recapitulation of phylogeny, i.e. that successive embryonic stages during the development of an organism re-trace the morphological forms of its ancestors over the course of evolution. Much of this idea has since been discredited. Today, phylogenies are often based on families of molecular sequences. The standard approach starts with a multiple sequence alignment, in which the sequences are arranged relative to each other in a way that maximises a measure of similarity position-by-position along their entire length. A tree (or sometimes a network) is then inferred. Rigorous multiple sequence alignment is computationally demanding, and evolutionary processes that shape the genomes of many microbes (bacteria, archaea and some morphologically simple eukaryotes) can add further complications. In particular, recombination, genome rearrangement and lateral genetic transfer undermine the assumptions that underlie multiple sequence alignment, and imply that a tree-like structure may be too simplistic. Here, using genome sequences of 143 bacterial and archaeal genomes, we construct a network of phylogenetic relatedness based on the number of shared k-mers (subsequences at fixed length k). Our findings suggest that the network captures not only key aspects of microbial genome evolution as inferred from a tree, but also features that are not treelike. The method is highly scalable, allowing for investigation of genome evolution across a large number of genomes. Instead of using specific regions or sequences from genome sequences, or indeed Haeckel's idea of ontogeny, we argue that genome phylogenies can be inferred using k-mers from whole-genome sequences. Representing these networks dynamically allows biological questions of interest to be formulated and addressed quickly and in a visually intuitive manner.
Q-Index Code C1
Q-Index Status Provisional Code
Institutional Status UQ

Document type: Journal Article
Sub-type: Article (original research)
Collections: HERDC Pre-Audit
Institute for Molecular Bioscience - Publications
 
Versions
Version Filter Type
Citation counts: Scopus Citation Count Cited 0 times in Scopus Article
Google Scholar Search Google Scholar
Created: Tue, 07 Feb 2017, 13:29:00 EST by Cheong Xin Chan on behalf of Institute for Molecular Bioscience