Loading…

Using elastic net regression to perform spectrally relevant variable selection

Multivariate data such as spectra frequently contain measured variables that are uninformative, and removal of such variables requires the use of methods that can be used to select informative variables. Partial least squares (PLS) regression may incorporate information from uninformative measured v...

Full description

Saved in:
Bibliographic Details
Published in:Journal of chemometrics 2018-08, Vol.32 (8), p.n/a
Main Authors: Giglio, Cannon, Brown, Steven D.
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Multivariate data such as spectra frequently contain measured variables that are uninformative, and removal of such variables requires the use of methods that can be used to select informative variables. Partial least squares (PLS) regression may incorporate information from uninformative measured variables, and so it is important to select variables before performing the PLS regression. Elastic net (EN) regression can be used to perform variable selection automatically. An EN regression can be used to select groups of correlated variables or to select either sparse or nonsparse sets of variables. However, the predictive performance of the EN regression can be significantly worse than competing 1‐step variable selection methods such as variable importance in projection (VIP). In the present work, the use of the EN to select variables, followed by conventional PLS regression on the selected variables (EN‐PLS), has been investigated. Variable selection by using EN‐PLS was compared with that from EN regression, sparse PLS regression, VIP, and from selectivity ratio selection on 2 data sets of visible/near‐infrared spectra. In all cases, the wavelengths selected were compared with reference data. The variables selected by using EN‐PLS offered advantages in interpretability and gave more robust prediction performance as compared with those obtained from full‐spectrum PLS and the other variable selection methods. This paper reports a method for variable selection by using an EN regression prior to a second regression by using PLS, a 2‐step method termed EN‐PLS. Variables selected by using EN‐PLS are compared with variables selected from the EN regression, as well as VIP, selectivity ratio, and the sparse PLS regression, 3 commonly used methods for variable selection in chemometrics. The EN‐PLS is shown to select variables that were more easily interpreted. In addition, EN‐PLS performed more robustly than a PLS regression performed on all variables, as well as reduced PLS regressions by using variables selected from either the sparse PLS regression algorithm or a VIP variable selection followed by PLS modeling. A novel combination of elastic net regression and partial least squares regression (EN‐PLS) is compared with variable importance in projection (VIP) and selectivity ratio methods for selection of important variables from near infrared spectra. EN‐PLS was found to select spectrally relevant variables and to give better quantitative performance than partial
ISSN:0886-9383
1099-128X
DOI:10.1002/cem.3034