Loading…

A validated ensemble method for multinomial land-cover classification

Land-cover data provides valuable information for landscape management and can be generated using machine learning algorithms. Ensemble models or model averaging can overcome difficulties in selecting an adequate algorithm and improve model predictions, but its use is limited among ecologists. The o...

Full description

Saved in:
Bibliographic Details
Published in:Ecological informatics 2020-03, Vol.56, p.101065, Article 101065
Main Authors: Diengdoh, Vishesh L., Ondei, Stefania, Hunt, Mark, Brook, Barry W.
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Land-cover data provides valuable information for landscape management and can be generated using machine learning algorithms. Ensemble models or model averaging can overcome difficulties in selecting an adequate algorithm and improve model predictions, but its use is limited among ecologists. The objective of this study is to highlight the benefits and limitations of weighted and unweighted majority voting ensemble models for land-cover classification and to enable easy and wider implementation of the method by providing an R-script (for use in the R software). Using a case study of three mixed-use landscapes from southern Australia (Tasmania), land cover was classified into six classes using Landsat 8 imagery and ancillary data, and support vector machine, random forest, k-nearest neighbour and naïve Bayesian as base algorithms. The predicted classifications of the base algorithms were then averaged using both an unweighted and weighted (using the true skill statistic) majority voting ensemble algorithm. Cross-validation results showed the base algorithms achieved similar accuracy making algorithm selection difficult. The base algorithms achieved high and similar predictive accuracy when the classified land-cover and training data belong to the same geographic region but lower and different predictive accuracy when the classified land-cover and training data belong to different geographic regions. The weighted and unweighted ensemble achieved similar overall accuracy, equivalent to the best performing base algorithm. We conclude that the majority voting ensemble can be adopted to overcome difficulties in model selection during land-cover classification. •Using cross-validation resulted in no preferred algorithm for classification.•Different base algorithms achieved the highest accuracy in different scenarios.•The weighted and unweighted ensemble consistently achieved high accuracy.•The ensemble algorithm is provided as a script for use with software R.
ISSN:1574-9541
DOI:10.1016/j.ecoinf.2020.101065