Reshape Data with Multiple Measure Columns from Wide to Long Format in R (2 Examples)
This tutorial is about how to reshape data with multiple measure columns from wide to long format in R.
The table of content is structured as follows:
Let’s dive into it!
Add-On Packages & Example Data
There are two commonly used packages, namely tidyr and reshape2, for reshaping data. I’ll illustrate the use of the gather() and melt() functions of the tidyr and reshape2 packages, respectively.
Note that there are even further alternatives for converting data sets from wide to long format available. Recently, the pivot_longer function is getting more popular. You can learn more about this function in this tutorial.
However, for the demonstration in the present tutorial, the built-in dataset USArrests is imported with a simple data manipulation in order to change the default indexing. The data shows the arrests per 100.000 inhabitants for assault, murder and rape, and the percentage of the urban population for each 50 US states.
data("USArrests") States<-rownames(USArrests) USArrests<-cbind(States, USArrests, row.names=NULL) head(USArrests, 10) # States Murder Assault UrbanPop Rape # 1 Alabama 13.2 236 58 21.2 # 2 Alaska 10.0 263 48 44.5 # 3 Arizona 8.1 294 80 31.0 # 4 Arkansas 8.8 190 50 19.5 # 5 California 9.0 276 91 40.6 # 6 Colorado 7.9 204 78 38.7 # 7 Connecticut 3.3 110 77 11.1 # 8 Delaware 5.9 238 72 15.8 # 9 Florida 15.4 335 80 31.9 # 10 Georgia 17.4 211 60 25.8
As seen above, the data is in a wide format such that the multiple measures of a single state are recorded in one row.
Example 1: Wide to Long Format (The tidyr Package)
The gather() function uses 4 main arguments, which refer to the dataset, the name of the variable used to store the measurement names, the name of the variable used to store the values, and the columns to be gathered. Here is an example for gathering all the measures for each state.
install.packages("tidyr") library(tidyr) USArrests_long1<-gather(data=USArrests, key= Measures, value=Values, Murder:Rape) head(USArrests_long1,10) # States Measures Values # 1 Alabama Murder 13.2 # 2 Alaska Murder 10.0 # 3 Arizona Murder 8.1 # 4 Arkansas Murder 8.8 # 5 California Murder 9.0 # 6 Colorado Murder 7.9 # 7 Connecticut Murder 3.3 # 8 Delaware Murder 5.9 # 9 Florida Murder 15.4 # 10 Georgia Murder 17.4
The same output can also be displayed as sorted by the states as a matter of preference.
head(USArrests_long1[order(USArrests_long1$States),],10) # States Measures Values # 1 Alabama Murder 13.2 # 51 Alabama Assault 236.0 # 101 Alabama UrbanPop 58.0 # 151 Alabama Rape 21.2 # 2 Alaska Murder 10.0 # 52 Alaska Assault 263.0 # 102 Alaska UrbanPop 48.0 # 152 Alaska Rape 44.5 # 3 Arizona Murder 8.1 # 53 Arizona Assault 294.0
One might be interested in gathering the rates of arrests only. In such a case, the column selection should be coded as follows. The select documentation can be visited for more advanced selection methods.
USArrests_long2<-gather(data=USArrests, key= Measures, value=Values, c(Murder:Rape, -UrbanPop)) head(USArrests_long2[order(USArrests_long2$States),],10) # States UrbanPop Measures Values # 1 Alabama 58 Murder 13.2 # 51 Alabama 58 Assault 236.0 # 101 Alabama 58 Rape 21.2 # 2 Alaska 48 Murder 10.0 # 52 Alaska 48 Assault 263.0 # 102 Alaska 48 Rape 44.5 # 3 Arizona 80 Murder 8.1 # 53 Arizona 80 Assault 294.0 # 103 Arizona 80 Rape 31.0 # 4 Arkansas 50 Murder 8.8
It is important to note that omitting the UrbanPop variable from the column selection does not yield a variable drop in the dataset. Instead, the variable becomes an identity variable like States, hence a data manipulation is required if dropping is desired. In this respect, the melt() function is more intuitive and provides more flexibility. Let’s see how differently melt() operates.
Example 2: Wide to Long Format (The reshape2 Package)
The melt() function uses 5 main arguments which refer to the dataset, identity variables, measurement variables, the name of the variable used to store the measurement names, and the name of the variable used to store the values. Here is an example of gathering the all measures for each state. Since the melt() function assumes that all non-id variables are measurement variables, the measure.vars argument doesn’t have to be explicitly stated in this case. However, for illustrative purposes, it is stated in the example below.
install.packages("reshape2") library(reshape2) USArrests_long3<-melt(USArrests, id.vars=c("States"), measure.vars=c("Murder", "Assault", "UrbanPop", "Rape"), value.name="Measures", variable.name="Values") head(USArrests_long3[order(USArrests_long3$States),], 10) ## States Values Measures ## 1 Alabama Murder 13.2 ## 51 Alabama Assault 236.0 ## 101 Alabama UrbanPop 58.0 ## 151 Alabama Rape 21.2 ## 2 Alaska Murder 10.0 ## 52 Alaska Assault 263.0 ## 102 Alaska UrbanPop 48.0 ## 152 Alaska Rape 44.5 ## 3 Arizona Murder 8.1 ## 53 Arizona Assault 294.0
As mentioned in the previous section, one might be interested in gathering the rates of arrests only. Here is how it is done with melt(). Be aware that the Values and Measures columns are named “Rates” and “Arrests” in line with the change.
USArrests_long4<-melt(USArrests, id.vars=c("States"), measure.vars=c("Murder", "Assault", "Rape"), value.name="Rates", variable.name="Arrests") head(USArrests_long4[order(USArrests_long4$States),], 10) ## States Arrests Rates ## 1 Alabama Murder 13.2 ## 51 Alabama Assault 236.0 ## 101 Alabama Rape 21.2 ## 2 Alaska Murder 10.0 ## 52 Alaska Assault 263.0 ## 102 Alaska Rape 44.5 ## 3 Arizona Murder 8.1 ## 53 Arizona Assault 294.0 ## 103 Arizona Rape 31.0 ## 4 Arkansas Murder 8.8
If the interest is having UrbanPop as an identity variable like in the first example then the code line should be rewritten as follows.
USArrests_long5<-melt(USArrests, id.vars=c("States", "UrbanPop"), measure.vars=c("Murder", "Assault", "Rape"), value.name="Rates", variable.name="Arrests") head(USArrests_long5[order(USArrests_long5$States),], 10) ## States UrbanPop Arrests Rates ## 1 Alabama 58 Murder 13.2 ## 51 Alabama 58 Assault 236.0 ## 101 Alabama 58 Rape 21.2 ## 2 Alaska 48 Murder 10.0 ## 52 Alaska 48 Assault 263.0 ## 102 Alaska 48 Rape 44.5 ## 3 Arizona 80 Murder 8.1 ## 53 Arizona 80 Assault 294.0 ## 103 Arizona 80 Rape 31.0 ## 4 Arkansas 50 Murder 8.8
Another flexibility of the melt() function is that it can work on arrays and matrices as well, whereas the gather() function can’t.
Video, Further Resources & Summary
Do you need more explanations on this topic? Then you might check out the following video from the Statistics Globe YouTube channel.
The YouTube video will be added soon.
Furthermore, you could have a look at some of the other tutorials on Statistics Globe:
- Data Manipulation in R
- Drop Multiple Columns from Data Frame Using dplyr Package in R
- Introduction to the R Programming Language
- Reshape Data Frame from Wide to Long Format in R
This article has demonstrated how to transform data frames from wide to long in the R programming language. If you have further questions, you may leave a comment below.
This page was created in collaboration with Cansu Kebabci. You might have a look at Cansu’s author page to get more information about her academic background and the other articles she has written for Statistics Globe.
Statistics Globe Newsletter