Presence-only versus presence-absence data in species composition determinant analyses
biodiversity, canonical correspondence analysis, environmental determinants, GBIF, simulations, species composition, virtual species
Aim Studying relationships between species and their physical environment requires species distribution data, ideally based on presence–absence (P–A) data derived from surveys. Such data are limited in their spatial extent. Presence-only (P-O) data are considered inappropriate for such analyses. Our aim was to evaluate whether such data may be used when considering a multitude of species over a large spatial extent, in order to analyse the relationships between environmental factors and species composition. Location The study was conducted in virtual space. However, geographic origin of the data used is the contiguous USA. Methods We created distribution maps for 50 virtual species based on actual environmental conditions in the study. Sampling locations were based on true observations from the Global Biodiversity Information Facility. We produced P–A data by selecting ∼1000 random locations and recorded the presence/absence of all species. We produced two P-O data sets. Full P-O set was produced by sampling the species in locations of true occurrences of species. Partial P-O was a subset of full P-O data set matching the size of the P–A data set. For each data set, we recorded the environmental variables at the same locations. We used CCA to evaluate the amount of variance in species composition explained by each variable. We evaluated the bias in the data set by calculating the deviation of average values of the environmental variables in sampled locations compared to the entire area. Results P–A and P-O data sets were similar in terms of the amount of variance explained by the different environmental variables. We found sizable environmental and spatial bias in the P-O data set, compared to the entire study area. Main conclusions Our results suggest that although P-O data from collections contain bias, the multitude of species, and thus the relatively large amount of information in the data, allow the use of P-O data for analysing environmental determinants of species composition.