Mixed impressions in species distribution modeling

By A L Carter

Claude Monet was one of the founding members of the Anonymous Society, a small group of artists that started the Impressionist movement in Paris. He also possessed a Darwinian level of beard follicles. In the 19th century, Impressionist painting techniques were considered radical by those who were accustomed to looking at paintings of people and places and things and were SHOCKED to see paintings that only showed artists’ impressions of people and places and things. In the 21st century, though, Monet is 10,000+ greeting cards-strong. So even if you’re not sure if you’ve ever seen a Monet painting, you’ve definitely seen a Monet painting.

The impressions of Impressionism are created using bright colors – relatively new in Monet’s time – layered with visible brush strokes. When viewed as a whole, they form an image. If you stand very close to a Monet painting, you will see the individual brush strokes but not the whole image. If you walk back-and-forth enough times in front of the same Monet painting, you will become dizzy, fall down, and promptly be removed by museum security.

soleil_levantImpression, Sunrise by Oscar-Claude Monet (d 1926). 1872. Oil on canvas.

The basic idea of a species distribution model (SDM) is that we can describe a species’ environmental niche – the conditions it needs to live – by determining how its geographic distribution overlaps with the environment (you can read about SDMs here and about a very popular method of building them here). They’re also called ‘climate envelopes’ or ‘ecological niche models,’ and they’re one of the most widely used tools in ecology.

Each piece of environmental data is like a single Impressionistic brush stroke. When we have enough occurrence records for a species, the different layers of environmental data form an image of its environmental niche. The question that remains is whether that image is a ‘good’ (or ‘excellent’ or ‘not even close, try again’) approximation of that niche. Answering that question – that is, doing model evaluation – is an essential step. A good model of a mosquito’s environmental niche allows us to answer important questions like “Is this disease going to spread north with climate change and kill us all?”

In a paper I wish I’d written, Fourcade et al. (2017) evaluated two sets of distribution models for 497 different species. They built the first set using the 20 WorldClim variables, a gridded environmental dataset that includes 19 bioclimatic variables (BioClim), which are various representations of temperature and rainfall, plus elevation. In the four leading biogeography journals, nearly 90% of distribution models published from 2012-2016 used at least one of the BioClim variables, and 20% used all of them. They built the second set of SDMs using a set of 20 pseudo-variables that they…wait for it…derived from a Google ImageÒ search for ‘classical painting.’

In a Results section that should give spatial ecologists everywhere that intrigued-but-also-deeply-uncomfortable-with-this feeling, the authors found that the SDMs built using painting-derived variables (that is, complete and utter nonsense) were often evaluated as ‘good’ or ‘excellent’ models. Don’t run away. It gets better (worse?). Because 30% of those models were evaluated as better than the SDMs built using the real environmental data.

The limitations of SDMs are already well-recognized (see here and here), as are the limitations of the metrics used to evaluate them (see here and here). But. What this study has done goes beyond identifying and endlessly re-hashing those limitations. As the authors state, “this result shows that SDMs with no biological realism could easily occur and remain undetected…” Emphasis mine.

As ecologists, we can confidently apply individual brush strokes to our canvas, and we can step back to see whether our layers form a discernible image. And maybe they do. And maybe it’s a lovely picture. But are we Monet? Or did our cat walk across the canvas when we weren’t looking? Using contemporary evaluation metrics, we have absolutely no idea what we’re painting until we’re finished and no idea whether our finished painting is the one we meant to create in the first place.

So what to do about this conundrum? The authors state – rightly – that the environmental variables underlying SDMs need to be selected rigorously, based on empirical knowledge of organisms’ physiology. While we should already be doing this, the fact that 20% of published SDMs used all 19 BioClim variables does not give me much confidence on this front (but that’s another post). But even given that knowledge, we would still be unable to confidently evaluate our models.

The authors also state that we need better metrics for evaluating SDMs. This is also true, and the major conclusion of the paper, that their findings “…question the current practices of the SDM community in terms of model evaluation and variable selection.” That’s a very generous way of saying that, until we have evaluation metrics that can differentiate between SDMs built with actual environmental data vs. internet-sourced nonsense, we really just need to stop using them. Discussion welcome.

And also, if you haven’t before, go see a real Monet painting.

Author biography: A L Carter is a postdoc who uses remote sensing and GIS to study thermal ecology of [mostly] reptiles. Maintains a close friendship with R and tense mutual understanding with GRASS. Follow at @NthChapter

Image caption: An impressionist painting depicting a small rowboat on calm water near a harbor entrance, with a muted sunrise reflecting on the water. Mist obscures larger ships and distant buildings in an industrialized port. Credit- Periodic .gif created at using image downloaded from Wikimedia Commons (link)