Variable selection for generalized canonical correlation analysis

Tenenhaus, Arthur, Philippe, Cathy, Guillemot, Vincent, Le Cao, Kim-Anh, Grill, Jacques and Frouin, Vincent (2014) Variable selection for generalized canonical correlation analysis. Biostatistics, 15 3: 569-583. doi:10.1093/biostatistics/kxu001

Author Tenenhaus, Arthur
Philippe, Cathy
Guillemot, Vincent
Le Cao, Kim-Anh
Grill, Jacques
Frouin, Vincent
Title Variable selection for generalized canonical correlation analysis
Journal name Biostatistics   Check publisher's open access policy
ISSN 1468-4357
Publication date 2014-07-03
Year available 2014
Sub-type Article (original research)
DOI 10.1093/biostatistics/kxu001
Open Access Status Not Open Access
Volume 15
Issue 3
Start page 569
End page 583
Total pages 15
Place of publication Oxford, United Kingdom
Publisher Oxford University Press
Language eng
Formatted abstract
Regularized generalized canonical correlation analysis (RGCCA) is a generalization of regularized canonical correlation analysis to 3 or more sets of variables. RGCCA is a component-based approach which aims to study the relationships between several sets of variables. The quality and interpretability of the RGCCA components are likely to be affected by the usefulness and relevance of the variables in each block. Therefore, it is an important issue to identify within each block which subsets of significant variables are active in the relationships between blocks. In this paper, RGCCA is extended to address the issue of variable selection. Specifically, sparse generalized canonical correlation analysis (SGCCA) is proposed to combine RGCCA with an ℓ1-penalty in a unified framework. Within this framework, blocks are not necessarily fully connected, which makes SGCCA a flexible method for analyzing a wide variety of practical problems. Finally, the versatility and usefulness of SGCCA are illustrated on a simulated dataset and on a 3-block dataset which combine gene expression, comparative genomic hybridization, and a qualitative phenotype measured on a set of 53 children with glioma. SGCCA is available on CRAN as part of the RGCCA package.
Keyword Generalized canonical correlation analysis
Multiblock data analysis
Variable selection
Q-Index Code C1
Q-Index Status Confirmed Code
Institutional Status UQ

Document type: Journal Article
Sub-type: Article (original research)
Collections: Official 2015 Collection
Institute for Molecular Bioscience - Publications
Version Filter Type
Citation counts: TR Web of Science Citation Count  Cited 20 times in Thomson Reuters Web of Science Article | Citations
Scopus Citation Count Cited 20 times in Scopus Article | Citations
Google Scholar Search Google Scholar
Created: Tue, 08 Jul 2014, 11:49:13 EST by System User on behalf of Institute for Molecular Bioscience