Downloading biodiversity records from iNaturalist with Python

biodiversityDS.
2 min readSep 14, 2020

--

iNaturalist is a global platform where naturalists, citizen scientists, and biologists post their observations with photographs. Observations can be curated by the network of users to provide valuable open data to scientific research projects, conservation agencies and the general public. In particular, the data has been used to describe the global distribution of species, address niche-based questions, support biodiversity and ecosystem-based conservation, and to understand correlations between anthropogenic pressures and population extinctions.

Data may be accessed via its website, mobile application or API. Here I provide some line of python code to automatically download multiple records from iNaturalist using the API. In this particular case, the code will get all data for the Parque Marinho Luiz Saldanha, a Marine Protected Area located in Portugal.

# import dependenciesfrom requests import request
import json
from pandas.io.json import json_normalize
import pandas as pd
import urllib.request
import os
# Get all records for Parque Marinho Luiz SaldanhaoutputFolder = '/theFolderToSaveData'
placeID = '128892'
if not os.path.exists(outputFolder):
os.makedirs(outputFolder)
df = pd.DataFrame([])iteraction = 0
dataSize = 1
while (dataSize > 0):
iteraction = iteraction + 1
url = 'https://api.inaturalist.org/v1/observations?geo=true&identified=true&place_id='+placeID+'&rank=species&order=desc&order_by=created_at&page='+str(iteraction)+'&per_page=200'
response = requests.get(url)
dictr = response.json()
recs = dictr['results']
dataSize = len(recs)
data = json_normalize(recs)
if dataSize > 0:
df = df.append(data)
# Save images to the folderfor x in range(0, len(df)):
id = df['id'].values[x]
imageURL = json_normalize(df['photos'].values[x])['url'].values[0]
imageURLM = imageURL.replace("square", "medium")
imageURLO = imageURL.replace("square", "original")
urllib.request.urlretrieve(imageURL, outputFolder+'/'+str(id)+'Sq.jpg')
urllib.request.urlretrieve(imageURLM, outputFolder+'/'+str(id)+'M.jpg')
urllib.request.urlretrieve(imageURLO, outputFolder+'/'+str(id)+'O.jpg')
# Prepare pandas data.frame and export to csvlist(df.columns)dfLatitude = df["location"].str.split(",", expand = True)[0]
dfLongitude= df["location"].str.split(",", expand = True)[1]
df1 = df[['id','taxon.name', 'quality_grade', 'time_observed_at', 'species_guess', 'license_code', 'observed_on', 'community_taxon_id']]df1 = pd.concat([df1, dfLongitude, dfLatitude], axis=1)
df1.columns = ['id','taxon.name', 'quality_grade', 'time_observed_at', 'species_guess', 'license_code', 'observed_on', 'community_taxon_id','Longitude','Latitude']
df1.to_csv(outputFolder+'/dataFrame.csv', sep='\t', header=True)

--

--

biodiversityDS.

Hi!! I’m Jorge Assis, a Data Scientist, Marine Ecologist, Climate Change Analyst, R and Python Developer based in Portugal [biodiversitydatascience.com]