Spatial bias in the GBIF database and its effect on modelling species’ geographic distributions
AUC, ecological niche modelling, Lepidoptera, Maxent
Species distribution modelling, in combination with databases of specimen distribution records, is advocated as a solution to the problem of distributional data limitation in biogeography and ecology. The global biodiversity information facility (GBIF), a portal that collates digitized collection and survey data, is the largest online provider of distribution records. However, all distributional databases are spatially biased due to uneven effort of sampling, data storage and mobilization. Such bias is particularly pronounced in GBIF, where nation-wide differences in funding and data sharing lead to huge differences in contribution to GBIF. We use a common Eurasian butterfly (Aglais urticae) as an exemplar taxon to provide evidence that range model quality is decreasing due to the spatial clustering of distributional records in GBIF. Furthermore, we show that such loss of model quality would go unnoticed with standard methods of model quality evaluation. Using evaluations of model predictions of the Swiss distribution of the species, we compare distribution models of full data with data where a subsampling procedure removes spatial bias at the cost of record numbers, but not of spatial extent of records. We show that data with less spatial bias produce better predictive models even though they are based on less input data. Our subsampling routine may therefore be a suitable method to reduce the impact of spatial bias to species distribution models. Our results warn of automatized applications of species distribution models to distributional databases (as has been advocated and implemented), as internal model evaluation did not show the decline of model quality with increased spatial bias (but rather the opposite) while expert evaluation clearly did.