A genome alignment algorithm based on compression

Cao, Minh Duc, Dix, Trevor I. and Allison, Lloyd (2010) A genome alignment algorithm based on compression. BMC Bioinformatics, 11 1-16. doi:10.1186/1471-2105-11-599

Author Cao, Minh Duc
Dix, Trevor I.
Allison, Lloyd
Title A genome alignment algorithm based on compression
Journal name BMC Bioinformatics   Check publisher's open access policy
ISSN 1471-2105
Publication date 2010-12-16
Sub-type Article (original research)
DOI 10.1186/1471-2105-11-599
Open Access Status DOI
Volume 11
Start page 1
End page 16
Total pages 16
Place of publication London, United Kingdom
Publisher BioMed Central
Language eng
Formatted abstract
Background: Traditional genome alignment methods consider sequence alignment as a variation of the string edit distance problem, and perform alignment by matching characters of the two sequences. They are often computationally expensive and unable to deal with low information regions. Furthermore, they lack a well-principled objective function to measure the performance of sets of parameters. Since genomic sequences carry genetic information, this article proposes that the information content of each nucleotide in a position should be considered in sequence alignment. An information-theoretic approach for pairwise genome local alignment, namely XMAligner, is presented. Instead of comparing sequences at the character level, XMAligner considers a pair of nucleotides from two sequences to be related if their mutual information in context is significant. The information content of nucleotides in sequences is measured by a lossless compression technique.
Results: Experiments on both simulated data and real data show that XMAligner is superior to conventional methods especially on distantly related sequences and statistically biased data. XMAligner can align sequences of eukaryote genome size with only a modest hardware requirement. Importantly, the method has an objective function which can obviate the need to choose parameter values for high quality alignment. The alignment results from XMAligner can be integrated into a visualisation tool for viewing purpose.
Conclusions: The information-theoretic approach for sequence alignment is shown to overcome the mentioned problems of conventional character matching alignment methods. The article shows that, as genomic sequences are meant to carry information, considering the information content of nucleotides is helpful for genomic sequence alignment.
Availability: Downloadable binaries, documentation and data can be found at ftp://ftp.infotech.monash.edu.au/software/DNAcompress-XM/XMAligner/.
Q-Index Code C1
Q-Index Status Provisional Code
Institutional Status Non-UQ

Document type: Journal Article
Sub-type: Article (original research)
Collection: Institute for Molecular Bioscience - Publications
Version Filter Type
Citation counts: TR Web of Science Citation Count  Cited 0 times in Thomson Reuters Web of Science Article
Scopus Citation Count Cited 5 times in Scopus Article | Citations
Google Scholar Search Google Scholar
Created: Thu, 28 Aug 2014, 13:59:58 EST by Ms Kate Rowe on behalf of Institute for Molecular Bioscience