Statistical analysis on microarray data: selection of gene prognosis signatures

Le Cao, Kim-Anh and McLachlan, Geoffrey J. (2009). Statistical analysis on microarray data: selection of gene prognosis signatures. In Tuan Pham (Ed.), Computational biology: issues and applications in oncology (pp. 55-76) New York, United States: Springer. doi:10.1007/978-1-4419-0811-7_3


Author Le Cao, Kim-Anh
McLachlan, Geoffrey J.
Title of chapter Statistical analysis on microarray data: selection of gene prognosis signatures
Title of book Computational biology: issues and applications in oncology
Place of Publication New York, United States
Publisher Springer
Publication Year 2009
Sub-type Research book chapter (original research)
DOI 10.1007/978-1-4419-0811-7_3
ISBN 9781441908100
-9781441908117
Editor Tuan Pham
Chapter number 3
Start page 55
End page 76
Total pages 22
Total chapters 12
Collection year 2010
Language eng
Subjects B1
970101 Expanding Knowledge in the Mathematical Sciences
010401 Applied Statistics
Abstract/Summary Microarrays are being increasingly used in cancer research for a better understanding of the molecular variations among tumours or other biological conditions. They allow for the measurement of tens of thousands of transcripts simultaneously in one single experiment. The problem of analysing these data sets becomes non-standard and represents a challenge for both statisticians and biologists, as the dimension of the feature space (the number of genes or transcripts) is much greater than the number of tissues. Therefore, the selection of marker genes among thousands to diagnose a cancer type is of crucial importance and can help clinicians to develop gene-expression-based diagnostic tests to guide therapy in cancer patients. In this chapter, we focus on the classification and the prediction of a sample given some carefully chosen gene expression profiles. We review some state-of-the-art machine learning approaches to perform gene selection: recursive feature elimination, nearest-shrunken centroids and random forests. We discuss the difficulties that can be encountered when dealing with microarray data, such as selection bias, multiclass and unbalanced problems. The three approaches are then applied and compared on a typical cancer gene expression study.
Q-Index Code B1
Q-Index Status Confirmed Code

 
Versions
Version Filter Type
Citation counts: TR Web of Science Citation Count  Cited 2 times in Thomson Reuters Web of Science Article | Citations
Google Scholar Search Google Scholar
Created: Thu, 08 Apr 2010, 00:07:36 EST by Kay Mackie on behalf of Institute for Molecular Bioscience