A sparse PLS for variable selection when integrating omics data

Lê Cao, Kim-Anh, Rossouw, Debra, Robert-Granié, Christèle and Besse, Philippe (2008) A sparse PLS for variable selection when integrating omics data. Statistical Applications in Genetics and Molecular Biology, 7 1: 35-2-35-32. doi:10.2202/1544-6115.1390

Attached Files (Some files may be inaccessible until you login with your UQ eSpace credentials)
Name Description MIMEType Size Downloads
UQ205582_OA.pdf Full text (open access) application/pdf 1.36MB 0

Author Lê Cao, Kim-Anh
Rossouw, Debra
Robert-Granié, Christèle
Besse, Philippe
Title A sparse PLS for variable selection when integrating omics data
Journal name Statistical Applications in Genetics and Molecular Biology   Check publisher's open access policy
ISSN 1544-6115
Publication date 2008-01
Sub-type Article (original research)
DOI 10.2202/1544-6115.1390
Open Access Status File (Publisher version)
Volume 7
Issue 1
Start page 35-2
End page 35-32
Total pages 31
Place of publication Berkeley Electronic Press
Publisher Berkeley, CA, U.S.A.
Language eng
Abstract Recent biotechnology advances allow for multiple types of omics data, such as transcriptomic, proteomic or metabolomic data sets to be integrated. The problem of feature selection has been addressed several times in the context of classification, but needs to be handled in a specific manner when integrating data. In this study, we focus on the integration of two-block data that are measured on the same samples. Our goal is to combine integration and simultaneous variable selection of the two data sets in a one-step procedure using a Partial Least Squares regression (PLS) variant to facilitate the biologists' interpretation. A novel computational methodology called "sparse PLS" is introduced for a predictive analysis to deal with these newly arisen problems. The sparsity of our approach is achieved with a Lasso penalization of the PLS loading vectors when computing the Singular Value Decomposition. Sparse PLS is shown to be effective and biologically meaningful. Comparisons with classical PLS are performed on a simulated data set and on real data sets. On one data set, a thorough biological interpretation of the obtained results is provided. We show that sparse PLS provides a valuable variable selection tool for highly dimensional data sets. Copyright ©2008 The Berkeley Electronic Press. All rights reserved.
Keyword Joint analysis
Two-block data set
Multivariate regression
Dimension reduction
Q-Index Code C1
Q-Index Status Provisional Code
Institutional Status Non-UQ
Additional Notes Article No. 35, pp. 2-32

Document type: Journal Article
Sub-type: Article (original research)
Collections: Excellence in Research Australia (ERA) - Collection
Institute for Molecular Bioscience - Publications
Version Filter Type
Citation counts: TR Web of Science Citation Count  Cited 64 times in Thomson Reuters Web of Science Article | Citations
Scopus Citation Count Cited 0 times in Scopus Article
Google Scholar Search Google Scholar
Created: Fri, 28 May 2010, 11:37:25 EST by Mary-Anne Marrington on behalf of Institute for Molecular Bioscience