Loading…
Three-step hybrid strategy towards efficiently selecting variables in multivariate calibration of near-infrared spectra
Variable (feature or wavelength) selection is a critical step in multivariate calibration of near-infrared (NIR) spectra. The high-resolution NIR or its imaging instruments usually generate hundreds or thousands of wavelengths, which make the variable selection methods tend to appear a high risk of...
Saved in:
Published in: | Spectrochimica acta. Part A, Molecular and biomolecular spectroscopy Molecular and biomolecular spectroscopy, 2020-01, Vol.224, p.117376, Article 117376 |
---|---|
Main Authors: | , , , , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Variable (feature or wavelength) selection is a critical step in multivariate calibration of near-infrared (NIR) spectra. The high-resolution NIR or its imaging instruments usually generate hundreds or thousands of wavelengths, which make the variable selection methods tend to appear a high risk of overfitting, low efficiency, or requiring large computational abilities. Thus, it is a great challenge to efficiently select informative variables and obtain an optimal variable combination in a huge variable space. We propose a hybrid strategy for efficiently selecting variables based on three steps including rough selection, fine selection and optimal selection. The strong interpretability method like wavelength interval selection method (interval partial least squares, iPLS) was first used to roughly select informative intervals and shrink the variable space. Wavelength point selection methods such as variable importance in projection (VIP) and modified variable combination population analysis (mVCPA) were used to continuingly shrink the variable space from large to small in order to remain the very important variables. In the third step, applying some optimization methods such as iteratively retaining informative variables (IRIV) and genetic algorithm (GA) is to find an optimal variable combination from the remaining variables. It makes full use of the advantages of various involved methods and makes up for their disadvantages when facing high dimensional data. Two NIR datasets were employed to investigate the performance of the three-step hybrid strategy. It can significantly improve the prediction performance of the models built when compared with other single or hybrid methods (iPLS, VIP, iPLS-VIP, iPLS-VCPA, iPLS-mVCPA, VIP-GA, VIP-IRIV, mVCPA-GA, mVCPA-IRIV), indicating that the three-step hybrid strategy, including iPLS-VIP-IRIV, iPLS-VIP-GA, iPLS-mVCPA-GA and iPLS-mVCPA-IRIV, could efficiently select informative variables. Therefore, the three-step hybrid strategy is a good alternative for variable selection methods in the face of high dimensional NIR spectral data.
[Display omitted]
•A hybrid variable selection strategy based on three steps including rough selection, fine selection and optimal selection, was proposed.•The hybrid strategy takes full use of the advantage of each selected single method.•Provide a better prediction performance when compared with the single methods and the combinations of two kinds of methods. |
---|---|
ISSN: | 1386-1425 |
DOI: | 10.1016/j.saa.2019.117376 |