Estimating change-points in biological sequences via the cross-entropy method

Evans, G. E., Sofronov, G. Y., Keith, J. M. and Kroese, D. P. (2011) Estimating change-points in biological sequences via the cross-entropy method. Annals of Operations Research, 189 1: 155-165. doi:10.1007/s10479-010-0687-0

Author Evans, G. E.
Sofronov, G. Y.
Keith, J. M.
Kroese, D. P.
Title Estimating change-points in biological sequences via the cross-entropy method
Journal name Annals of Operations Research   Check publisher's open access policy
ISSN 0254-5330
Publication date 2011
Year available 2010
Sub-type Article (original research)
DOI 10.1007/s10479-010-0687-0
Volume 189
Issue 1
Start page 155
End page 165
Total pages 11
Place of publication United States
Publisher Springer New York LLC
Collection year 2011
Language eng
Abstract The genomes of complex organisms, including the human genome, are known to vary in GC content along their length. That is, they vary in the local proportion of the nucleotides G and C, as opposed to the nucleotides A and T. Changes in GC content are often abrupt, producing well-defined regions. We model DNA sequences as a multiple change-point process in which the sequence is separated into segments by an unknown number of change-points, with each segment supposed to have been generated by a different process. Multiple change-point problems are important in many biological applications, particularly in the analysis of DNA sequences. Multiple change-point problems also arise in segmentation of protein sequences according to hydrophobicity. We use the Cross-Entropy method to estimate the positions of the change-points. Parameters of the process for each segment are approximated with maximum likelihood estimates. Numerical experiments illustrate the effectiveness of the approach. We obtain estimates of the locations of change-points in artificially generated sequences and compare the accuracy of these estimates with those obtained via other methods such as IsoFinder (Oliver et al. in Nucl. Acids Res. 32:W283-W292, 2004) and Markov Chain Monte Carlo. Lastly, we provide examples with real data sets to illustrate the usefulness of our method. © 2010 Springer Science+Business Media, LLC.
Keyword Genome
Q-Index Code C1
Q-Index Status Confirmed Code
Institutional Status UQ

Document type: Journal Article
Sub-type: Article (original research)
Collections: School of Mathematics and Physics
Official 2011 Collection
Version Filter Type
Citation counts: TR Web of Science Citation Count  Cited 8 times in Thomson Reuters Web of Science Article | Citations
Scopus Citation Count Cited 13 times in Scopus Article | Citations
Google Scholar Search Google Scholar
Created: Wed, 02 Mar 2011, 14:25:28 EST by Kay Mackie on behalf of School of Mathematics & Physics