Loading…
Cost-efficient unsupervised sample selection for multivariate calibration
Indirect quantification of chemical composition through spectral measurements requires the establishment of multivariate calibration models. The reference analyses on the calibration samples typically form a major cost factor in the establishment of these multivariate models. Therefore, the aim of t...
Saved in:
Published in: | Chemometrics and intelligent laboratory systems 2021-08, Vol.215, p.104352, Article 104352 |
---|---|
Main Authors: | , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Indirect quantification of chemical composition through spectral measurements requires the establishment of multivariate calibration models. The reference analyses on the calibration samples typically form a major cost factor in the establishment of these multivariate models. Therefore, the aim of this study was to select the most informative calibration samples in an unsupervised way based on the spectral measurements. To this end, guidelines to address this challenge in PLSR model building have been developed. The recommendations include calculating a sample size that surpasses the model complexity by a factor of 12, performing the selection in the PCA score space spanned by a sufficiently large number of principal components and using methods such as Kennard-Stone, Puchwein, Clustering or D-optimal designs. We provide the data and methodology used in the present study for future use.
•We aim to find a subset that would render a model of comparable performance to the model rendered by the full set of samples.•The optimal sample size for a multivariate calibration model depends on the complexity of such a model.•It is possible to evaluate the quality of the selected samples by comparing covariance matrices for PLSR calibration models.•Control over the sample size and the input dimensionality lowers the impact of the selection method. |
---|---|
ISSN: | 0169-7439 1873-3239 |
DOI: | 10.1016/j.chemolab.2021.104352 |