Exploring lateral genetic transfer among microbial genomes using TF-IDF

Cong, Yingnan, Chan, Yao-ban and Ragan, Mark A. (2016) Exploring lateral genetic transfer among microbial genomes using TF-IDF. Scientific Reports, 6 . doi:10.1038/srep29319

Related Publications and Datasets
Author Cong, Yingnan
Chan, Yao-ban
Ragan, Mark A.
Title Exploring lateral genetic transfer among microbial genomes using TF-IDF
Journal name Scientific Reports   Check publisher's open access policy
ISSN 2045-2322
Publication date 2016-07-25
Year available 2016
Sub-type Article (original research)
DOI 10.1038/srep29319
Open Access Status DOI
Volume 6
Total pages 16
Place of publication London, United Kingdom
Publisher Nature Publishing Group
Collection year 2017
Language eng
Formatted abstract
Many microbes can acquire genetic material from their environment and incorporate it into their genome, a process known as lateral genetic transfer (LGT). Computational approaches have been developed to detect genomic regions of lateral origin, but typically lack sensitivity, ability to distinguish donor from recipient, and scalability to very large datasets. To address these issues we have introduced an alignment-free method based on ideas from document analysis, term frequency-inverse document frequency (TF-IDF). Here we examine the performance of TF-IDF on three empirical datasets: 27 genomes of Escherichia coli and Shigella, 110 genomes of enteric bacteria, and 143 genomes across 12 bacterial and three archaeal phyla. We investigate the effect of k-mer size, gap size and delineation of groups on the inference of genomic regions of lateral origin, finding an interplay among these parameters and sequence divergence. Because TF-IDF identifies donor groups and delineates regions of lateral origin within recipient genomes, aggregating these regions by gene enables us to explore, for the first time, the mosaic nature of lateral genes including the multiplicity of biological sources, ancestry of transfer and over-writing by subsequent transfers. We carry out Gene Ontology enrichment tests to investigate which biological processes are potentially affected by LGT.
Keyword Lateral genetic transfer (LGT)
Term frequency-inverse document frequency (TF-IDF)
Escherichia coli
Microbial genomes
Q-Index Code C1
Q-Index Status Provisional Code
Institutional Status UQ

Document type: Journal Article
Sub-type: Article (original research)
Collections: HERDC Pre-Audit
Institute for Molecular Bioscience - Publications
Version Filter Type
Citation counts: TR Web of Science Citation Count  Cited 1 times in Thomson Reuters Web of Science Article | Citations
Scopus Citation Count Cited 1 times in Scopus Article | Citations
Google Scholar Search Google Scholar
Created: Mon, 08 Aug 2016, 14:30:39 EST by Anthony Yeates on behalf of Learning and Research Services (UQ Library)