Loading…
Predicting species distributions: a critical comparison of the most common statistical models using artificial species
Aim To test statistical models used to predict species distributions under different shapes of occurrence-environment relationship. We addressed three questions: (1) Is there a statistical technique that has a consistently higher predictive ability than others for all kinds of relationships? (2) How...
Saved in:
Published in: | Journal of biogeography 2007-08, Vol.34 (8), p.1455-1469 |
---|---|
Main Authors: | , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Aim To test statistical models used to predict species distributions under different shapes of occurrence-environment relationship. We addressed three questions: (1) Is there a statistical technique that has a consistently higher predictive ability than others for all kinds of relationships? (2) How does species prevalence influence the relative performance of models? (3) When an automated stepwise selection procedure is used, does it improve predictive modelling, and are the relevant variables being selected? Location We used environmental data from a real landscape, the state of California, and simulated species distributions within this landscape. Methods Eighteen artificial species were generated, which varied in their occurrence response to the environmental gradients considered (random, linear, Gaussian, threshold or mixed), in the interaction of those factors (no interaction vs. multiplicative), and on their prevalence (50% vs. 5%). The landscape was then randomly sampled with a large (n = 2000) or small (n = 150) sample size, and the predictive ability of each statistical approach was assessed by comparing the true and predicted distributions using five different indexes of performance (area under the receiver-operator characteristic curve, Kappa, correlation between true and predictive probability of occurrence, sensitivity and specificity). We compared generalized additive models (GAM) with and without flexible degrees of freedom, logistic regressions (general linear models, GLM) with and without variable selection, classification trees, and the genetic algorithm for rule-set production (GARP). Results Species with threshold and mixed responses, additive environmental effects, and high prevalence generated better predictions than did other species for all statistical models. In general, GAM outperforms all other strategies, although differences with GLM are usually not significant. The two variable-selection strategies presented here did not discriminate successfully between truly causal factors and correlated environmental variables. Main conclusions Based on our analyses, we recommend the use of GAM or GLM over classification trees or GARP, and the specification of any suspected interaction terms between predictors. An expert-based variable selection procedure was preferable to the automated procedures used here. Finally, for low-prevalence species, variability in model performance is both very high and sample-dependent. This suggests that dist |
---|---|
ISSN: | 0305-0270 1365-2699 |
DOI: | 10.1111/j.1365-2699.2007.01720.x |