Access & Collect Data with APIs in R (Example)

In this tutorial, I’ll demonstrate how to use an API in R Programming.

Here are the topics we’ll cover:

 

Kirby White Statistician Programmer

Note: This article was written in collaboration with Kirby White. Kirby is an organizational effectiveness consultant and researcher, who is currently pursuing a Ph.D. at the Seattle Pacific University. You can read more about Kirby here!

 

Packages

For this tutorial, you’ll need these two packages:

install.packages(c("httr","jsonlite"))
library(httr)
library(jsonlite)

What is an API?

An API (Application Programming Interface) is an intermediary between a dataset (usually a very large one) and the rest of the world (like us!) APIs provide an accessible way to request a dataset. which is referred to as making a “call” to the API. A call is sent to the API by opening a web address.

In this tutorial, we’re going to request data from the API at COVID Act Now.

 

Components of a URL

This particular call would request time series COVID data for a single county in the United States (identified by its FIPS code 06037):

https://api.covidactnow.org/v2/county/06037.timeseries.json?apiKey=xyxyxy

There are several pieces to this API call. The first is the base URL:

https://api.covidactnow.org/v2/

This part of the URL will be the same for all our calls to this API.

The county/ portion of the URL indicates that we only want COVID data for a single county. By looking at the COVID Act Now API documentation, I can see that states is an alternative option for this part of the URL.

06037 is the unique identifier for a single county. If I want to get the same data but for a different county, I just have to change this number.

.timeseries provides the API with more information about the data I’m requesting, and .json tells the API to format the data as a JSON (which we’ll convert to a data frame).

Everything after ‘apiKey=’ is my authorization token, which tells the COVID Act Now servers that I’m allowed to ask for this data. ‘xyxyxy’ is not a real token, and you can get your own token here.

Now that we’ve dissected the anatomy of an API, you can see how easy it is to build them! Basically anybody with an internet connection, an authorization token, and who knows the grammar of the API can access it. Most APIs are published with extensive documentation to help you understand the available options and parameters.

 

Calling an API

It’s easiest to build an API URL by joining multiple text strings together. For this example, I want to get a time series of COVID data for a few counties. Let’s build the URL for one county, and later on we’ll see how to loop through multiple counties.

base <- 'https://api.covidactnow.org/v2/county/'
county <- '06037'
info_key <- '.timeseries.json?apiKey=xyxyxy'
 
API_URL <- paste0(base, county, info_key)

Now we have the entire URL stored in a simple R object called API_URL.

We can now use the URL to call the API, and we’ll store the returned data in an object called raw_data:

raw_data <- GET(API_URL)

You can type VIEW(raw_data) to examine what the API sent back, which isn’t in a usable format yet. You’ll notice a “status” element of the list. Traditionally, a status of “200” means that the API call was successful, and other codes are used to indicate errors. You can troubleshoot those error codes using the API documentation.

 

Converting JSON Results to a Data Frame

We received the data in a format that isn’t very easy to work with yet. Thankfully, we can store it in a data frame with just a few steps.

First, we’ll convert the raw data into an R list:

COVID_list <- fromJSON(rawToChar(raw_data$content), flatten = TRUE)

Now that it’s in a list format, you can see that it actually contains several data frames!

You can use this data right away if you are already familiar with lists in R, or you can extract the data frames into separate objects, like this:

df <- COVID_list$actualsTimeseries

The data frame that we have just created contains many different variables and a lot of information. Below, you can see the first six rows of a selection of some interesting variables in our data:

head(df[ , c("cases", "deaths", "newCases", "newDeaths", "date")])
#   cases deaths newCases newDeaths       date
# 1    NA     NA       NA        NA 2020-01-22
# 2    NA     NA       NA        NA 2020-01-23
# 3    NA     NA       NA        NA 2020-01-24
# 4    NA     NA       NA        NA 2020-01-25
# 5     1      0       NA        NA 2020-01-26
# 6     1      0        0         0 2020-01-27

 

Looping Multiple API Calls

Now that we’ve seen how to make an API call for one county, let’s create a simple loop to make several calls at a time. We’ll use a for loop, which you can read more about here.

First, we’ll create a vector with the ID code for each county we want to get data for:

counties <- c('01001', '01003', '01005')

Then, we’ll loop through each element of the vector and adjust our API_URL accordingly:

base <- 'https://api.covidactnow.org/v2/county/'
county <- '06037'
info_key <- '.timeseries.json?apiKey=xyxyxy'
 
for(i in 1:length(counties)) {
 
  # Build the API URL with the new county code
  API_URL <- paste0(base, counties[i], info_key)
 
  # Store the raw and processed API results in temporary objects
  temp_raw <- GET(API_URL)
  temp_list <- fromJSON(rawToChar(temp_raw$content), flatten = TRUE)
 
  # Add the most recent results to your data frame
  df <- rbind(df, temp_list$actualsTimeseries)
}

Working with APIs is challenging at first (and even once you have the hang of it!), but they can provide a scalable and customizable way to gather data directly in R.

 

Video Tutorial & Further Resources

Do you need more explanations on how to use APIs from within R? Then you might have a look at the following YouTube video of IDG TECHtalk. In the video, the speaker shows another example on how to collect data using APIs.

 

Please accept YouTube cookies to play this video. By accepting you will be accessing content from YouTube, a service provided by an external third party.

YouTube Content Consent Button Thumbnail

YouTube privacy policy

If you accept this notice, your choice will be saved and the page will refresh.

 

Furthermore, you might have a look at the related tutorials on Statistics Globe:

In case you have any further questions or comments, please let us know in the comments section below. We are happy to read your feedback!

 

Subscribe to the Statistics Globe Newsletter

Get regular updates on the latest tutorials, offers & news at Statistics Globe.
I hate spam & you may opt out anytime: Privacy Policy.


7 Comments. Leave new

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.

Menu
Top