Loading…
Improving the predictions of soil properties from VNIR–SWIR spectra in an unlabeled region using semi-supervised and active learning
•The semi-supervised learning framework of LapSVR is presented.•LapSVR outperforms other supervised techniques in VNIR-SWIR spectroscopy.•The active learning framework is presented along with a novel strategy.•Active learning identifies samples from the unknown region that ought to be labeled.•The p...
Saved in:
Published in: | Geoderma 2021-04, Vol.387, p.114830, Article 114830 |
---|---|
Main Authors: | , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | •The semi-supervised learning framework of LapSVR is presented.•LapSVR outperforms other supervised techniques in VNIR-SWIR spectroscopy.•The active learning framework is presented along with a novel strategy.•Active learning identifies samples from the unknown region that ought to be labeled.•The proposed methodologies statistically outperform their counterparts.
Monitoring the status of the soil ecosystem to identify the spatio-temporal extent of the pressures exerted and mitigate the effects of climate change and land degradation necessitates the need for reliable and cost-effective solutions. To address this need, soil spectroscopy in the visible, near- and shortwave-infrared (VNIR–SWIR) has emerged as a viable alternative to traditional analytical approaches. To this end, large-scale soil spectral libraries coupled with advanced machine learning tools have been developed to infer the soil properties from the hyperspectral signatures. However, models developed from one region may exhibit diminished performance when applied to a new, unseen by the model, region due to the large and inherent soil variability (e.g. pedogenetical differences, diverse soil types etc.). Given an existing spectral library with labeled data and a new unlabeled region (i.e. where no soil samples are analytically measured) the question then becomes how to best develop a model which can more accurately predict the soil properties of the unlabeled region.
In this paper, a machine learning technique leveraging on the capabilities of semi-supervised learning which exploits the predictors’ distribution of the unlabeled dataset and of active learning which expertly selects a small set of data from the unlabeled dataset as a spiking subset in order to develop a more robust model is proposed. The semi-supervised learning approach is the Laplacian Support Vector Regression following the manifold regularization framework. As far as the active learning component is concerned, the pool-based approach is utilized as it best matches with the aforementioned use-case scenario, which iteratively selects a subset of data from the unlabeled region to spike the calibration set. As a query strategy, a novel machine learning–based strategy is proposed herein to best identify the spiking subset at each iteration. The experimental analysis was conducted using data from the Land Use and Coverage Area Frame Survey of 2009 which covered most of the then member-states of the European Union, and in particul |
---|---|
ISSN: | 0016-7061 1872-6259 |
DOI: | 10.1016/j.geoderma.2020.114830 |