Loading…
Dimensionality reduction through clustering of variables and canonical correlation
Dimensionality reduction techniques are highly useful statistical tools in analyzing datasets from various scientific fields. These methods often provide, in addition to reducing the number of variables or cases, interpretable and informative variables or dimensions. This work proposes a new techniq...
Saved in:
Published in: | Journal of the Korean Statistical Society 2024-09 |
---|---|
Main Authors: | , , , |
Format: | Article |
Language: | English |
Citations: | Items that this one cites |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Dimensionality reduction techniques are highly useful statistical tools in analyzing datasets from various scientific fields. These methods often provide, in addition to reducing the number of variables or cases, interpretable and informative variables or dimensions. This work proposes a new technique for reducing the number of variables in datasets. The procedure is based on combining Variable Cluster Analysis and Canonical Correlation Analysis to determine synthetic variables that are representative of the clusters. The design of the procedure leads to the definition of a homogeneity index based on the statistical dependence within each cluster, and based on this index, a measure of the adequacy of the obtained cluster structure is proposed. Various artificial datasets have been generated in order to illustrate the ability of the statistical technique to detect the dependence structure between variables and reduce the dimensionality. Given that the technique can be applied to datasets with a dimension greater than the sample size, its application is illustrated in a dataset that suffers from this issue: high-dimensional data and small sample size. Furthermore, the application of the technique on two real data sets is illustrated. |
---|---|
ISSN: | 1226-3192 2005-2863 |
DOI: | 10.1007/s42952-024-00290-3 |