Reducing spatial autocorrelation in Species Distribution Modelling

biodiversityDS.
3 min readAug 5, 2020

--

Species distribution models (SDM; for review and definition see, e.g., Peterson et al., 2011) are a dominant paradigm to quantify the relationship between environmental dynamics and several manifestations of species biogeography. These statistical approaches pushed an emerging body of research describing the global distribution of species, addressing niche-based questions, supporting biodiversity conservation and ecosystem-based management, as well as infering the likely anthropogenic pressures leading to population turnover and extinction.

Spatial autocorrelation (SA) is a common challenge while modelling the distribution and abundance of species. This phenomenon, likely present in most ecological datasets, denotes the situation where the values of variables sampled at nearby locations are not independent due to correlation with values at nearby locations (i.e., the value of a predictor variable at a given site can be partially predicted by the values at neighbouring sites).

Accounting for SA has not received much attention in applied SDM studies, however, when present, it may result in poorly specified models and inappropriate spatial inference and prediction. Recent studies proposed to incorporate SA into the actual models while predicting distributions (coined ‘spatial models’; Dormann, 2007), however, this approach does not allow to transfer models to new independent data (e.g., temporal and spatial transferability).

I propose a straightforward approach to reduce the effect of SA in SDM (see also Boavida et al., 2016 for more details). I use a simple example bellow focused on a brown algae species capable of producing marine forests and a set of environmental predictors known to largely explain its distribution.

Get the R code: Reducing spatial autocorrelation
https://github.com/jorgeassis/spatialAutocorrelation
Fig. Initial set of occurrence records with potential negative effect of spatial autocorrelation.

1. A correlogram is produced to assess the correlation of each variable predictor within a range of geographic distances.

2. For each distance class, a linear model tests the effect of correlation with geographic distance. This finds the minimum non-significant autocorrelated distance.

Fig. Correlogram of a variable predictor within a range of distances (Open circles: non-significant correlation).

3. The average of the minimum non-significant distances found per variable is used to prune the occurrence records, by leaving only one record within such distance.

Fig. Minimum and average (dashed line) non-significant autocorrelated distances of variable predictors.
Fig. Final pruned dataset of occurrence records with reduced potential effect of spatial autocorrelation.
Get the R code: Reducing spatial autocorrelation
https://github.com/jorgeassis/spatialAutocorrelation

Literature cited

C. F. Dormann (2007) Effects of incorporating spatial autocorrelation into the analysis of species distribution data. Global Ecology and Biogeography. 16, 129–138.

Peterson, A.T., Soberon, J., Pearson, R.G., Anderson, R.P., Martinez-Meyer, E., Nakamura, M. &Araujo, M.B. (2011) Ecological niches and geographic distributions. Monographs in Population Biology, 314 pp. Vol. 49. Princeton University Press, Princeton.

Boavida, J., Assis, J., Silva, I. et al. (2016) Overlooked habitat of a vulnerable gorgonian revealed in the Mediterranean and Eastern Atlantic by ecological niche modelling. Scientific Reports. 6, 36460.

--

--

biodiversityDS.

Hi!! I’m Jorge Assis, a Data Scientist, Marine Ecologist, Climate Change Analyst, R and Python Developer based in Portugal [biodiversitydatascience.com]