Loading…
Generic Feature Selection with Short Fat Data
Consider a regression problem in which there are many more explanatory variables than data points, ., ≫ . Essentially, without reducing the number of variables inference is impossible. So, we group the explanatory variables into blocks by clustering, evaluate statistics on the blocks and then regres...
Saved in:
Published in: | Journal of the Indian Society of Agricultural Statistics 2014, Vol.68 (2), p.145-162 |
---|---|
Main Authors: | , |
Format: | Article |
Language: | English |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Consider a regression problem in which there are many more explanatory variables than data points,
.,
≫
. Essentially, without reducing the number of variables inference is impossible. So, we group the
explanatory variables into blocks by clustering, evaluate statistics on the blocks and then regress the response on these statistics under a penalized error criterion to obtain estimates of the regression coefficients. We examine the performance of this approach for a variety of choices of
,
, classes of statistics, clustering algorithms, penalty terms, and data types. When
is not large, the discrimination over number of statistics is weak, but computations suggest regressing on approximately [
/
] statistics where
is the number of blocks formed by a clustering algorithm. Small deviations from this are observed when the blocks of variables are of very different sizes. Larger deviations are observed when the penalty term is an
norm with high enough
. |
---|---|
ISSN: | 0019-6363 |