Sparse PLS discriminant analysis: biologically relevant feature selection and graphical displays for multiclass problems

Lê Cao, Kim-Anh, Boitard, Simon and Besse, Philippe (2011) Sparse PLS discriminant analysis: biologically relevant feature selection and graphical displays for multiclass problems. Bmc Bioinformatics, 12 22: Article number 253. doi:10.1186/1471-2105-12-253


Author Lê Cao, Kim-Anh
Boitard, Simon
Besse, Philippe
Title Sparse PLS discriminant analysis: biologically relevant feature selection and graphical displays for multiclass problems
Journal name Bmc Bioinformatics   Check publisher's open access policy
ISSN 1471-2105
Publication date 2011-06-01
Sub-type Article (original research)
DOI 10.1186/1471-2105-12-253
Open Access Status DOI
Volume 12
Issue 22
Start page Article number 253
Total pages 16
Place of publication London, United Kingdom
Publisher BioMed Central
Collection year 2012
Language eng
Formatted abstract
Background:
Variable selection on high throughput biological data, such as gene expression or single nucleotide polymorphisms (SNPs), becomes inevitable to select relevant information and, therefore, to better characterize diseases or assess genetic structure. There are different ways to perform variable selection in large data sets. Statistical tests are commonly used to identify differentially expressed features for explanatory purposes, whereas Machine Learning wrapper approaches can be used for predictive purposes. In the case of multiple highly correlated variables, another option is to use multivariate exploratory approaches to give more insight into cell biology, biological pathways or complex traits.

Results:

A simple extension of a sparse PLS exploratory approach is proposed to perform variable selection in a multiclass classification framework.

Conclusions:
sPLS-DA has a classification performance similar to other wrapper or sparse discriminant analysis approaches on public microarray and SNP data sets. More importantly, sPLS-DA is clearly competitive in terms of computational efficiency and superior in terms of interpretability of the results via valuable graphical outputs. sPLS-DA is available in the R package mixOmics, which is dedicated to the analysis of large biological data sets.
Keyword Partial Least-Squares
Gene-Expression Data
Canonical Correlation-Analysis
Tumor Classification
Cancer Classification
Principal Components
Dimension Reduction
Variable Selection
R Package
Regression
Q-Index Code C1
Q-Index Status Confirmed Code
Institutional Status UQ

Document type: Journal Article
Sub-type: Article (original research)
Collections: Official 2012 Collection
Institute for Molecular Bioscience - Publications
 
Versions
Version Filter Type
Citation counts: TR Web of Science Citation Count  Cited 10 times in Thomson Reuters Web of Science Article | Citations
Scopus Citation Count Cited 89 times in Scopus Article | Citations
Google Scholar Search Google Scholar
Created: Sun, 14 Aug 2011, 10:31:42 EST by System User on behalf of Institute for Molecular Bioscience