Loading…

Dimensionality reduction through clustering of variables and canonical correlation

Dimensionality reduction techniques are highly useful statistical tools in analyzing datasets from various scientific fields. These methods often provide, in addition to reducing the number of variables or cases, interpretable and informative variables or dimensions. This work proposes a new techniq...

Full description

Saved in:
Bibliographic Details
Published in:Journal of the Korean Statistical Society 2024-09
Main Authors: Muñoz-Pichardo, Juan M., Pino-Mejías, Rafael, Cubiles-de-la-Vega, M. Dolores, Enguix-González, Alicia
Format: Article
Language:English
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Dimensionality reduction techniques are highly useful statistical tools in analyzing datasets from various scientific fields. These methods often provide, in addition to reducing the number of variables or cases, interpretable and informative variables or dimensions. This work proposes a new technique for reducing the number of variables in datasets. The procedure is based on combining Variable Cluster Analysis and Canonical Correlation Analysis to determine synthetic variables that are representative of the clusters. The design of the procedure leads to the definition of a homogeneity index based on the statistical dependence within each cluster, and based on this index, a measure of the adequacy of the obtained cluster structure is proposed. Various artificial datasets have been generated in order to illustrate the ability of the statistical technique to detect the dependence structure between variables and reduce the dimensionality. Given that the technique can be applied to datasets with a dimension greater than the sample size, its application is illustrated in a dataset that suffers from this issue: high-dimensional data and small sample size. Furthermore, the application of the technique on two real data sets is illustrated.
ISSN:1226-3192
2005-2863
DOI:10.1007/s42952-024-00290-3