Loading…

Validity evaluation of a machine-learning model for chlorophyll a retrieval using Sentinel-2 from inland and coastal waters

[Display omitted] •ML models were developed for Chla retrieval for inland and coastal waters using MSI.•Light gradient boosting machine (LGBM) outperformed other ML algorithms.•Post-hoc explanations to LGBM were provided using SHAP.•Rrs(704)/Rrs(665) was the most important input feature.•Percent for...

Full description

Saved in:
Bibliographic Details
Published in:Ecological indicators 2022-04, Vol.137, p.108737, Article 108737
Main Authors: Woo Kim, Young, Kim, TaeHo, Shin, Jihoon, Lee, Dae-Seong, Park, Young-Seuk, Kim, Yeji, Cha, YoonKyung
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:[Display omitted] •ML models were developed for Chla retrieval for inland and coastal waters using MSI.•Light gradient boosting machine (LGBM) outperformed other ML algorithms.•Post-hoc explanations to LGBM were provided using SHAP.•Rrs(704)/Rrs(665) was the most important input feature.•Percent forest within the 500-m buffer zone explained among-lake Chla variations. The MultiSpectral Instrument (MSI) on-board Sentinel-2 provides satellite images at spatiotemporal resolutions suitable for chlorophyll a (Chla) retrieval from inland and coastal waters. Machine-learning (ML) algorithms including light gradient boosting machine (LGBM) were employed for Chl a retrieval from MSI. The study area encompasses 78 lakes and estuaries located across four major river watersheds in South Korea. Matchup data between MSI overpass and near-concurrent in situ Chl a measurements from December 2018 to April 2021 were included. The remote sensing reflectance (Rrs) values of six single spectral bands and four two-band ratios were used as the input features. Despite the difficulty in Chla estimation in optically complex waters, ML algorithms showed overall reasonable accuracy. Among the ML algorithms, LGBM exhibited the best performance (R2 = 0.75, bias = -0.15, slope = 0.73, RMSE = 15.15 mg·m-3, MAE = 9.49 mg·m-3) over a wide range of trophic states. Post-hoc interpretations of the best performing LGBM using Shapley additive explanations indicated that Rrs(704)/Rrs(665) was the most important feature, while Rrs(739)/Rrs(704) and Rrs(492)/Rrs(560) played auxiliary roles in Chl a retrieval through interaction with Rrs(704)/Rrs(665). Among-lake spatial variations of Chla were explained by percent forest and agricultural area within the buffer zone at multiple scales (buffer widths of 50 m and 500 m). The associations between the modeled Chla and buffer land cover types, that is, increase in Chla concentration with increase in percent forest and decrease in percent agricultural area, were consistent with the established ecological knowledge. Overall, the model interpretations and spatial variations in Chla within and among lakes confirmed the validity of LGBM for retrieving MSI-derived Chla from lakes and estuaries. Our study can serve as the reference for evaluating the validity of complex ML models for inland water remote sensing.
ISSN:1470-160X
1872-7034
DOI:10.1016/j.ecolind.2022.108737