The EAS group have specialist skills in developing species distribution models. Perhaps more correctly these products might be called habitat distribution models, as they take known location records for a species and use machine learning algorithms to predict where other suitable habitat may exist for the species.
We have produced several series of these models for conservation planning purposes, initially these were made just for threatened species, but this has expanded now to almost all vertebrate fauna and vascular plants found across south-eastern Australia. With each new series of models we uncover more and more of the intricacies and dilemmas involved in producing useful distribution models. We have found that it can be very easy to produce a distribution model, but that it is difficult to deliver a product that is entirely fit-for-purpose, especially when these distribution ‘maps’ can be used for many varied purposes by different end-users.
Producing these products relies upon spatially accurate location data for the species, as well as some consideration of the currency of the data. For example, a record for a long lived tree might be expected to have a currency for many decades or longer, provided that it was unlikely that a clearing event had taken or place, or that it had been killed by a disturbance event such as a fire. In contrast a record for a bird species that is considered a vagrant in south-eastern Australia would have a much more limited ‘shelf-life’. Consideration of which positive records are used to train the model is extremely important.
It is possible to make a totally presence-only model, but these are not as robust as when you can supply information for where a species is not located. This can be relatively straightforward for data from plant surveys that detect all the species at a site. However obtaining high quality true absence data for fauna is a lot more difficult, and are generally only available for studies over small geographic ranges. This limits the ability to produce useful models across larger regions. Our group has spent considerable time developing robust strategies for allocating pseudo-absence data that enables us to produce useful model outputs across large spatial extents.
As well as the point-based information of species occurrences the other type of information required for making a distribution model are spatial representations of the bio-physical features. These include data of climate (temperature, rainfall, evaporation), terrain (elevation, insolation, local water accumulation, etc.), soils, etc. All of these data must be in a raster or grid format. Depending on the type, scale and temporal window of the model required, satellite imagery can also be highly informative. Here again, choosing the right type or form of the data relevant to the particular model is critical to creating a useful product.
We use a range of machine learning algorithms to construct species distribution models. Among the whole family of these methods we regularly use Random Forests for learning the relationships between species data and spatially explicit environmental data information. There are several published papers that describe this approach in detail, and there is other information published on DSE’s website under the NaturePrint program.
Why bother ?
What is the point of making literally thousands of different spatial models of the distribution of plants and animals? Ideally we would have a perfect knowledge of the distribution of all things, but this is impossible. Distribution models can be very useful for conservation planning when we have an incomplete idea of where species might be. Uses include:
- prioritising new areas to survey for a particular species
- conservation planning for particular species or multiple species
- supplying to specialist software systems (e.g. Zonation) that allow us to ask questions of the relative value of different parts of the landscape for any particular species, or for all of the species combined.
How good are these models ?
Models of any type or form are simplifications of the real world. They are by definition always wrong, as George Box famously wrote. The important point is how useful they are for a particular purpose. Species Distribution Models are simple expressions of where a species (or its habitat) might be located. We have found them to be very useful for a range of tasks, and much more useful than simply a scattering of dots on a map. However, they do have their limitations and constraints in spatial extent, granularity, and temporal accuracy.