Access & Collect Data with APIs in R (Example)
In this tutorial, I’ll demonstrate how to use an API in R Programming.
Here are the topics we’ll cover:
Note: This article was written in collaboration with Kirby White. Kirby is an organizational effectiveness consultant and researcher, who is currently pursuing a Ph.D. at the Seattle Pacific University. You can read more about Kirby here!
Packages
For this tutorial, you’ll need these two packages:
install.packages(c("httr","jsonlite")) library(httr) library(jsonlite)
What is an API?
An API (Application Programming Interface) is an intermediary between a dataset (usually a very large one) and the rest of the world (like us!) APIs provide an accessible way to request a dataset. which is referred to as making a “call” to the API. A call is sent to the API by opening a web address.
In this tutorial, we’re going to request data from the API at COVID Act Now.
Components of a URL
This particular call would request time series COVID data for a single county in the United States (identified by its FIPS code 06037):
https://api.covidactnow.org/v2/county/06037.timeseries.json?apiKey=xyxyxy
There are several pieces to this API call. The first is the base URL:
https://api.covidactnow.org/v2/
This part of the URL will be the same for all our calls to this API.
The county/ portion of the URL indicates that we only want COVID data for a single county. By looking at the COVID Act Now API documentation, I can see that states is an alternative option for this part of the URL.
06037 is the unique identifier for a single county. If I want to get the same data but for a different county, I just have to change this number.
.timeseries provides the API with more information about the data I’m requesting, and .json tells the API to format the data as a JSON (which we’ll convert to a data frame).
Everything after ‘apiKey=’ is my authorization token, which tells the COVID Act Now servers that I’m allowed to ask for this data. ‘xyxyxy’ is not a real token, and you can get your own token here.
Now that we’ve dissected the anatomy of an API, you can see how easy it is to build them! Basically anybody with an internet connection, an authorization token, and who knows the grammar of the API can access it. Most APIs are published with extensive documentation to help you understand the available options and parameters.
Calling an API
It’s easiest to build an API URL by joining multiple text strings together. For this example, I want to get a time series of COVID data for a few counties. Let’s build the URL for one county, and later on we’ll see how to loop through multiple counties.
base <- 'https://api.covidactnow.org/v2/county/' county <- '06037' info_key <- '.timeseries.json?apiKey=xyxyxy' API_URL <- paste0(base, county, info_key)
Now we have the entire URL stored in a simple R object called API_URL.
We can now use the URL to call the API, and we’ll store the returned data in an object called raw_data:
raw_data <- GET(API_URL)
You can type VIEW(raw_data) to examine what the API sent back, which isn’t in a usable format yet. You’ll notice a “status” element of the list. Traditionally, a status of “200” means that the API call was successful, and other codes are used to indicate errors. You can troubleshoot those error codes using the API documentation.
Converting JSON Results to a Data Frame
We received the data in a format that isn’t very easy to work with yet. Thankfully, we can store it in a data frame with just a few steps.
First, we’ll convert the raw data into an R list:
COVID_list <- fromJSON(rawToChar(raw_data$content), flatten = TRUE)
Now that it’s in a list format, you can see that it actually contains several data frames!
You can use this data right away if you are already familiar with lists in R, or you can extract the data frames into separate objects, like this:
df <- COVID_list$actualsTimeseries
The data frame that we have just created contains many different variables and a lot of information. Below, you can see the first six rows of a selection of some interesting variables in our data:
head(df[ , c("cases", "deaths", "newCases", "newDeaths", "date")]) # cases deaths newCases newDeaths date # 1 NA NA NA NA 2020-01-22 # 2 NA NA NA NA 2020-01-23 # 3 NA NA NA NA 2020-01-24 # 4 NA NA NA NA 2020-01-25 # 5 1 0 NA NA 2020-01-26 # 6 1 0 0 0 2020-01-27
Looping Multiple API Calls
Now that we’ve seen how to make an API call for one county, let’s create a simple loop to make several calls at a time. We’ll use a for loop, which you can read more about here.
First, we’ll create a vector with the ID code for each county we want to get data for:
counties <- c('01001', '01003', '01005')
Then, we’ll loop through each element of the vector and adjust our API_URL accordingly:
base <- 'https://api.covidactnow.org/v2/county/' county <- '06037' info_key <- '.timeseries.json?apiKey=xyxyxy' for(i in 1:length(counties)) { # Build the API URL with the new county code API_URL <- paste0(base, counties[i], info_key) # Store the raw and processed API results in temporary objects temp_raw <- GET(API_URL) temp_list <- fromJSON(rawToChar(temp_raw$content), flatten = TRUE) # Add the most recent results to your data frame df <- rbind(df, temp_list$actualsTimeseries) }
Working with APIs is challenging at first (and even once you have the hang of it!), but they can provide a scalable and customizable way to gather data directly in R.
Video Tutorial & Further Resources
Do you need more explanations on how to use APIs from within R? Then you might have a look at the following YouTube video on the Statistics Globe YouTube channel.
In the video, Kirby White shows another example on how to collect data using APIs. Furthermore, he shows a Shiny app that he has created based on the API he is introducing in the video.
Please accept YouTube cookies to play this video. By accepting you will be accessing content from YouTube, a service provided by an external third party.
If you accept this notice, your choice will be saved and the page will refresh.
Furthermore, you might have a look at the related tutorials on Statistics Globe:
- How to Use R to Download a File from the Internet
- Save & Load RData Workspace Files in R
- How to Write, Run & Use a Loop in R
- All R Programming Tutorials
In case you have any further questions or comments, please let us know in the comments section below. We are happy to read your feedback!
20 Comments. Leave new
This line does not work
COVID_list <- fromJSON(rawToChar(raw_data$content), flatten = TRUE)
COVID_list is a List of length 1, and has in "invalid API key' for an error. There is
data in raw_data$content.
Thus the next line
df <- COVID_list$actualsTimeseries
results in df=NULL
Hey Scott,
Thanks for the comment!
Have you created your own API key and replaced xyxyxy by it? This needs to be done to make the code work.
Regards
Joachim
No, I didn’t do that. I missed the link in the text, “..and you can get your own token here…”
Glad you found it Scott! I hope it works now.
when I go here directly https://api.covidactnow.org/v2/county/
I get an error message that I have to register for an API key at
https://apidocs.covidactnow.org/#register
and tell them what I want to use the data for, as if that’s any of their business?
Thanks for a useful tutorial. Everything went smoothly.
Herb
Hi Herb,
Thanks a lot for the kind feedback, that’s great to hear!
Regards
Joachim
That works well! Great to get more on how to work with bigger databases like this!
Many thanks.
That’s great to hear Andrew! Glad you found it interesting!
I was able to read the data. The dataframe only contains 20 names, while the json file contains several hundred. How do I drill down further into a time series set?
I appreciate your comments.
Thanks
Hey Peter,
Thank you for the comment! I’ve forwarded your question to Kirby.
Regards,
Joachim
Hi Peter,
Sometimes a list (or data frame) contains other dataframes, one for each row. You can access these “sub” data frames by referring to the individual elements. Sometimes it can be easiest to access these by using the View() function and then using the interface to find the right code.
I hope that helps!
-Kirby
Okey so this is weird. I have installed and included the packages in my script. However, I get the following error:
could not find function “GET”
Please help!
Hey Eugenia,
Have you made sure that the httr package was installed properly? What happens when you execute ?GET
Regards,
Joachim
Great tutorial!
Is the final code considered a function?
The reason I’m asking this is because I would like to create an R package that retrieves information from an API website.
I am not sure if the code provided above is already a function or if I will have to do an extra step to bundle the code into a function.
Thanks,
Hello Alvaro,
I am not an expert on this topic but it looks like you still need to bundle it into a function as it is just a for loop.
Regards,
Cansu
Hey!
Super useful tutorial, I’m having a bit of trouble looping a different API, it seems to be binding only the last SiteCode rather than the three that I have in the list, any ideas on why this might be happening? I’ll put the code below!
Thanks in advance
SiteCodes_all <- c('CLDP0002', 'CLDP0003', 'CLDP0004')
for(i in 1:length(SiteCodes_all)) {
allsites <- paste0(Base,Node,SiteCodes_all[i],'/',Pollutant,StartTime,EndTime,Averaging,Key)
temp_raw <- GET(allsites)
temp_list <- fromJSON(rawToChar(temp_raw$content))
df <- rbind(RoyalLondon_List, temp_list)
}
Hey Eleri,
Thank you for the kind words regarding the tutorial, glad you find it helpful!
Regarding your question, it seems like you are always overwriting the results of the previous iteration at the end of your loop (i.e. df is overwritten by new results).
Does the following code work for you?
RoyalLondon_List_new should contain all your data.
Regards,
Joachim
Thank you for this really wonderful tutorial!
If I was to not have an existing dataframe but wanted to run the loop to end up with a dataframe of several combined calls…how would I go about tweaking the last line of code?
Referring to this line (if I have not declared df outside the loop):
df <- rbind(df, temp_list$actualsTimeseries)
Hello Rine,
I am not sure if I get you well. Could you please give some more details about the case at your hand?
Best,
Cansu