Loading…

Cost-efficient unsupervised sample selection for multivariate calibration

Indirect quantification of chemical composition through spectral measurements requires the establishment of multivariate calibration models. The reference analyses on the calibration samples typically form a major cost factor in the establishment of these multivariate models. Therefore, the aim of t...

Full description

Saved in:
Bibliographic Details
Published in:Chemometrics and intelligent laboratory systems 2021-08, Vol.215, p.104352, Article 104352
Main Authors: Fonseca Diaz, Valeria, De Ketelaere, Bart, Aernouts, Ben, Saeys, Wouter
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Indirect quantification of chemical composition through spectral measurements requires the establishment of multivariate calibration models. The reference analyses on the calibration samples typically form a major cost factor in the establishment of these multivariate models. Therefore, the aim of this study was to select the most informative calibration samples in an unsupervised way based on the spectral measurements. To this end, guidelines to address this challenge in PLSR model building have been developed. The recommendations include calculating a sample size that surpasses the model complexity by a factor of 12, performing the selection in the PCA score space spanned by a sufficiently large number of principal components and using methods such as Kennard-Stone, Puchwein, Clustering or D-optimal designs. We provide the data and methodology used in the present study for future use. •We aim to find a subset that would render a model of comparable performance to the model rendered by the full set of samples.•The optimal sample size for a multivariate calibration model depends on the complexity of such a model.•It is possible to evaluate the quality of the selected samples by comparing covariance matrices for PLSR calibration models.•Control over the sample size and the input dimensionality lowers the impact of the selection method.
ISSN:0169-7439
1873-3239
DOI:10.1016/j.chemolab.2021.104352