Robust estimation of evolutionary distances with information theory

Cao, Minh Duc, Allison, Lloyd, Dix, Trevor I. and Boden, Mikael (2016) Robust estimation of evolutionary distances with information theory. Molecular Biology and Evolution, 33 5: 1349-1357. doi:10.1093/molbev/msw019


Author Cao, Minh Duc
Allison, Lloyd
Dix, Trevor I.
Boden, Mikael
Title Robust estimation of evolutionary distances with information theory
Journal name Molecular Biology and Evolution   Check publisher's open access policy
ISSN 1537-1719
0737-4038
Publication date 2016-02-23
Year available 2016
Sub-type Article (original research)
DOI 10.1093/molbev/msw019
Open Access Status Not Open Access
Volume 33
Issue 5
Start page 1349
End page 1357
Total pages 9
Place of publication Cary, United States
Publisher Oxford University Press
Collection year 2017
Language eng
Formatted abstract
Methods for measuring genetic distances in phylogenetics are known to be sensitive to the evolutionary model assumed. However, there is a lack of established methodology to accommodate the trade-off between incorporating sufficient biological reality and avoiding model overfitting. In addition, as traditional methods measure distances based on the observed number of substitutions, their tend to underestimate distances between diverged sequences due to backward and parallel substitutions. Various techniques were proposed to correct this, but they lack the robustness against sequences that are distantly related and of unequal base frequencies. In this article, we present a novel genetic distance estimate based on information theory that overcomes the above two hurdles. Instead of examining the observed number of substitutions, this method estimates genetic distances using Shannon’s mutual information. This naturally provides an effective framework for balancing model complexity and goodness of fit. Our distance estimate is shown to be approximately linear to elapsed time and hence is less sensitive to the divergence of sequence data and compositional biased sequences. Using extensive simulation data, we show that our method 1) consistently reconstructs more accurate phylogeny topologies than existing methods, 2) is robust in extreme conditions such as diverged phylogenies, unequal base frequencies data, and heterogeneous mutation patterns, and 3) scales well with large phylogenies.
Keyword Genetic distance
Information theory
Q-Index Code C1
Q-Index Status Provisional Code
Institutional Status UQ

Document type: Journal Article
Sub-type: Article (original research)
Collections: HERDC Pre-Audit
School of Chemistry and Molecular Biosciences
Institute for Molecular Bioscience - Publications
 
Versions
Version Filter Type
Citation counts: TR Web of Science Citation Count  Cited 0 times in Thomson Reuters Web of Science Article
Scopus Citation Count Cited 0 times in Scopus Article
Google Scholar Search Google Scholar
Created: Fri, 11 Mar 2016, 14:31:52 EST by Mrs Louise Nimwegen on behalf of School of Chemistry & Molecular Biosciences