Depending on the species distribution modelling algorithm that you want to use, you might need absence data in addition to presence data. This can either be true absence data or pseudo-absence data.
True absence data
When it is repeatedly observed that a species is not present in a particular location, you can presume that it is truly absent. True absence points refer to locations where the environmental conditions are unsuitable for a species to survive. However, it is advised to be careful with such conclusions, as for some species individuals might only be present at a location in particular seasons, for example if they are migratory animals. In general, comprehensive surveys can supply true absence data when sites have been visited one or more times and people used high quality detection methods suitable for the species. For example, to record true absences of a species that is only active during the night, surveys should only be carried out at night and conclusions about absences cannot be drawn if surveys were only conducted during daytime. Such surveys, however, are a time consuming job, and therefore true absence data is hardly ever available for any species.
If true absence data is not available for your species of interest, but you do want to use an algorithm that compares the environmental conditions of presence sites with those of absence sites, you can use pseudo-absence data. This is inferred absence data based on the information available about the presence locations of the species. It is important to generate pseudo-absence data as good as possible to correctly classify the conditions of absence locations. Two aspects of generating pseudo-absence data that can be customized in the BCCVL are the number of pseudo-absence points generated and the generation method. The optimum settings for both these aspects can differ among algorithms, and therefore it is good to investigate what the best options are for the algorithm of your choice. For example, Barbet-Massin et al. (2012) compared the performance of a variety of algorithms with different combinations of number of pseudo-absence and generation method settings.
Number of pseudo-absence points
With regards to the number of pseudo-absence points generated, it is often advised to take into account the ratio to the number of presence points. This ratio is also called the prevalence, and refers to the proportion of occupied locations relative to the number of absence points. Prevalence has been shown to influence model accuracy, which highlights the importance of selecting an appropriate ratio.
The default ratio in the BCCVL is set to 1:1 (pseudo-absence points:presence), which thus generates the same number of pseudo-absence points as there are presence points.
Pseudo-absence generation methods
In the BCCVL, we offer three different methods to generate pseudo-absence data:
Random (default): pseudo-absence points are randomly generated in a predefined geographical area, anywhere except for locations where presence has been recorded. In the BCCVL, the geographical area is either the extent of the environmental/climate layers, or the area defined in the geographical constraint tab of the SDM experiment.
Contrasting environment (referred to as 'SRE' in BCCVL): similar as the random method, but in addition to the exact locations of presences, all areas that have similar environmental conditions as those presence locations are excluded as well. Pseudo-absence points are thus only generated in locations that have contrasting environmental conditions to the presence locations. For this method you can specify a quantile which will be used to remove the most extreme values of the environmental variables to determine the presence location boundaries. The default is 0.025, which refers to a 95% confidence interval.
Min-max radius (referred to as 'disk' in BCCVL): this method generates pseudo-absence points only within a delimited geographical distance from recorded presence points, defined by a minimum and maximum radius around each presence location. It requires the input of a minimum and maximum distance from your presence points. Setting a minimum distance ensures that pseudo-absence points are not generated too close to a presence record, as you can assume that the environmental conditions would be too similar. Setting a maximum distance ensures that pseudo-absence points are not generated in inappropriate locations which may result in over-prediction.
In general, Barbet-Massin et al. (2012) recommended to use an equal number of pseudo-absence points as there are presence points (1:1 ratio) generated in locations with contrasting environmental conditions to those presence points for classification techniques (Classification Tree, Random Forest, Boosted Regression Tree), and a large (10,000) number of pseudo-absence points randomly generated in the study area for regression techniques (Generalized Linear Model, Generalized Additive Model).
- Barbet‐Massin M, Jiguet F, Albert CH, Thuiller W (2012) Selecting pseudo‐absences for species distribution models: how, where and how many? Methods in Ecology and Evolution, 3(2), 327-338.
- Chefaoui RM, Lobo JM (2008) Assessing the effects of pseudo-absences on predictive distribution model performance. Ecological modelling, 210(4), 478-486.
- Lobo JM, Jiménez‐Valverde A, Hortal J (2010) The uncertain nature of absences and their importance in species distribution modelling. Ecography, 33(1), 103-114.
- Phillips SJ, Dudík M, Elith J, Graham CH, Lehmann A, Leathwick J, Ferrier S (2009) Sample selection bias and presence-only distribution models: implications for background and pseudo-absence data. Ecological Applications, 19(1), 181-197.
- VanDerWal J, Shoo LP, Graham C, Williams SE (2009) Selecting pseudo-absence data for presence-only distribution modeling: How far should you stray from what you know? Ecological modelling, 220(4), 589-594.