Detecting Invasive Flora: Leveraging Web Scraping and API Integration with Biodiversity Databases

Tool for rapidly scanning a list of species for potential invasives using webscraping and API calls
Overview
I developed this tool as a part of a much larger data analysis/processing pipeline when I was working as a consultant for Conservation International (CI). This pipeline processes and analyzes data from a massive tree restoration effort initiated by MasterCard called the Priceless Planet Coalition (PPC), which aims to restore 100 million trees by 2025.
In order to ensure that these efforts go beyond simple ‘tree planting’ projects–which do not always result in a genuinely restored ecosystem–a robust monitoring protocol was developed where project partner’s would send people into the field periodically to collect data about the restoration sites–including things like counts of species and size classes of trees. Restoration requires ongoing work and attention. By including this monitoring protocol–which will go on for five years after initial planitng–CI and MasterCard can ensure that projects are indeed sustainably restoring trees and the myriad ecosystem services they provide.
With so many project partners planting thousands of different species in dozens of different countries, it is very important to ensure that invasive species are not being planted. Scanning the list of planted species manually would have taken a long time, so I developed scripts to do this rapidly.
This app is hosted on shiny.io, but I have embedded it here to use as well. The easiest way to see it work is to click the ‘Run Demo’ button, which creates an output of a preloaded species list. Alternatively, you can load in your own CSV or type in a comma-separated list yourself.
How it works
The script relies primarily on web-scraping the Global Invasive Species Database (GISD). This database had easy-to-parse html source code that allowed me to use the ‘rvest’ package to locate and extract essential elements of invasive species results such as a summary of invasiveness, the native range of the plant, and the alien range of the plant.
Since I knew most plants would certainly not be invasive–and since I didn’t want to spam requests to the GISD website needlessly–I used API calls to the Global Biodiversity Information Facility, which houses a record of what plants are included in the GISD database and which allows for a higher rate of API requests. This way, I could first check if the species actually existed in the database before performing webscraping on the GISD website itself.
The output of the tool is a CSV with columns storing relevant information from the GISD wesbite. To test it yourself, you can load your own CSV with a column called ‘species’, you can type in a comma-separated list of species, or you can simply run the demo which uses a preloaded species list.
Links
- Invasive species scanner Shiny app
- Source code for the app
- GitHub Repo for the Full Data Analysis Pipeline (which I also talk about as a separate project)
Credit
As I was searching for ways to automate this scanning process, I came across an old R package called ‘originr’ which sought to perform a similar function, I leaned heavily on the strategies employed by the package authors and am grateful to have found their code!