Reshape Data with Multiple Measure Columns from Wide to Long Format in R (2 Examples)

This tutorial is about how to reshape data with multiple measure columns from wide to long format in R.

The table of content is structured as follows:

Let’s dive into it!

 

Add-On Packages & Example Data

There are two commonly used packages, namely tidyr and reshape2, for reshaping data. I’ll illustrate the use of the gather() and melt() functions of the tidyr and reshape2 packages, respectively.

Note that there are even further alternatives for converting data sets from wide to long format available. Recently, the pivot_longer function is getting more popular. You can learn more about this function in this tutorial.

However, for the demonstration in the present tutorial, the built-in dataset USArrests is imported with a simple data manipulation in order to change the default indexing. The data shows the arrests per 100.000 inhabitants for assault, murder and rape, and the percentage of the urban population for each 50 US states.

data("USArrests")
States<-rownames(USArrests)
USArrests<-cbind(States, USArrests, row.names=NULL)
head(USArrests, 10)
 
#         States Murder Assault UrbanPop Rape
# 1      Alabama   13.2     236       58 21.2
# 2       Alaska   10.0     263       48 44.5
# 3      Arizona    8.1     294       80 31.0
# 4     Arkansas    8.8     190       50 19.5
# 5   California    9.0     276       91 40.6
# 6     Colorado    7.9     204       78 38.7
# 7  Connecticut    3.3     110       77 11.1
# 8     Delaware    5.9     238       72 15.8
# 9      Florida   15.4     335       80 31.9
# 10     Georgia   17.4     211       60 25.8

As seen above, the data is in a wide format such that the multiple measures of a single state are recorded in one row.

 

Example 1: Wide to Long Format (The tidyr Package)

The gather() function uses 4 main arguments, which refer to the dataset, the name of the variable used to store the measurement names, the name of the variable used to store the values, and the columns to be gathered. Here is an example for gathering all the measures for each state.

install.packages("tidyr")
library(tidyr)
USArrests_long1<-gather(data=USArrests, key= Measures, value=Values, Murder:Rape)
head(USArrests_long1,10)
 
#         States Measures Values
# 1      Alabama   Murder   13.2
# 2       Alaska   Murder   10.0
# 3      Arizona   Murder    8.1
# 4     Arkansas   Murder    8.8
# 5   California   Murder    9.0
# 6     Colorado   Murder    7.9
# 7  Connecticut   Murder    3.3
# 8     Delaware   Murder    5.9
# 9      Florida   Murder   15.4
# 10     Georgia   Murder   17.4

The same output can also be displayed as sorted by the states as a matter of preference.

head(USArrests_long1[order(USArrests_long1$States),],10)
 
#      States Measures Values
# 1   Alabama   Murder   13.2
# 51  Alabama  Assault  236.0
# 101 Alabama UrbanPop   58.0
# 151 Alabama     Rape   21.2
# 2    Alaska   Murder   10.0
# 52   Alaska  Assault  263.0
# 102  Alaska UrbanPop   48.0
# 152  Alaska     Rape   44.5
# 3   Arizona   Murder    8.1
# 53  Arizona  Assault  294.0

One might be interested in gathering the rates of arrests only. In such a case, the column selection should be coded as follows. The select documentation can be visited for more advanced selection methods.

USArrests_long2<-gather(data=USArrests, key= Measures, value=Values, c(Murder:Rape, -UrbanPop))
head(USArrests_long2[order(USArrests_long2$States),],10)
 
#       States UrbanPop Measures Values
# 1    Alabama       58   Murder   13.2
# 51   Alabama       58  Assault  236.0
# 101  Alabama       58     Rape   21.2
# 2     Alaska       48   Murder   10.0
# 52    Alaska       48  Assault  263.0
# 102   Alaska       48     Rape   44.5
# 3    Arizona       80   Murder    8.1
# 53   Arizona       80  Assault  294.0
# 103  Arizona       80     Rape   31.0
# 4   Arkansas       50   Murder    8.8

It is important to note that omitting the UrbanPop variable from the column selection does not yield a variable drop in the dataset. Instead, the variable becomes an identity variable like States, hence a data manipulation is required if dropping is desired. In this respect, the melt() function is more intuitive and provides more flexibility. Let’s see how differently melt() operates.

 

Example 2: Wide to Long Format (The reshape2 Package)

The melt() function uses 5 main arguments which refer to the dataset, identity variables, measurement variables, the name of the variable used to store the measurement names, and the name of the variable used to store the values.

Here is an example of gathering the all measures for each state. Since the melt() function assumes that all non-id variables are measurement variables, the measure.vars argument doesn’t have to be explicitly stated in this case. However, for illustrative purposes, it is stated in the example below.

install.packages("reshape2")
library(reshape2)
USArrests_long3<-melt(USArrests, id.vars=c("States"), measure.vars=c("Murder", "Assault", "UrbanPop", "Rape"), value.name="Measures", variable.name="Values")
head(USArrests_long3[order(USArrests_long3$States),], 10)
 
##      States   Values Measures
## 1   Alabama   Murder     13.2
## 51  Alabama  Assault    236.0
## 101 Alabama UrbanPop     58.0
## 151 Alabama     Rape     21.2
## 2    Alaska   Murder     10.0
## 52   Alaska  Assault    263.0
## 102  Alaska UrbanPop     48.0
## 152  Alaska     Rape     44.5
## 3   Arizona   Murder      8.1
## 53  Arizona  Assault    294.0

As mentioned in the previous section, one might be interested in gathering the rates of arrests only. Here is how it is done with melt(). Be aware that the Values and Measures columns are named “Rates” and “Arrests” in line with the change.

USArrests_long4<-melt(USArrests, id.vars=c("States"), measure.vars=c("Murder", "Assault", "Rape"), value.name="Rates", variable.name="Arrests")
head(USArrests_long4[order(USArrests_long4$States),], 10)
 
##       States Arrests Rates
## 1    Alabama  Murder  13.2
## 51   Alabama Assault 236.0
## 101  Alabama    Rape  21.2
## 2     Alaska  Murder  10.0
## 52    Alaska Assault 263.0
## 102   Alaska    Rape  44.5
## 3    Arizona  Murder   8.1
## 53   Arizona Assault 294.0
## 103  Arizona    Rape  31.0
## 4   Arkansas  Murder   8.8

If the interest is having UrbanPop as an identity variable like in the first example then the code line should be rewritten as follows.

USArrests_long5<-melt(USArrests, id.vars=c("States", "UrbanPop"), measure.vars=c("Murder", "Assault", "Rape"), value.name="Rates", variable.name="Arrests")
head(USArrests_long5[order(USArrests_long5$States),], 10)
 
##       States UrbanPop Arrests Rates
## 1    Alabama       58  Murder  13.2
## 51   Alabama       58 Assault 236.0
## 101  Alabama       58    Rape  21.2
## 2     Alaska       48  Murder  10.0
## 52    Alaska       48 Assault 263.0
## 102   Alaska       48    Rape  44.5
## 3    Arizona       80  Murder   8.1
## 53   Arizona       80 Assault 294.0
## 103  Arizona       80    Rape  31.0
## 4   Arkansas       50  Murder   8.8

Another flexibility of the melt() function is that it can work on arrays and matrices as well, whereas the gather() function can’t.

 

Video, Further Resources & Summary

Do you need more explanations on this topic? Then you might check out the following video from the Statistics Globe YouTube channel.

 

The YouTube video will be added soon.

 

Furthermore, you could have a look at some of the other tutorials on Statistics Globe:

This article has demonstrated how to transform data frames from wide to long in the R programming language. If you have further questions, you may leave a comment below.

 

Rana Cansu Kebabci Statistician & Data Scientist

This page was created in collaboration with Cansu Kebabci. You might have a look at Cansu’s author page to get more information about her academic background and the other articles she has written for Statistics Globe.

 

Subscribe to the Statistics Globe Newsletter

Get regular updates on the latest tutorials, offers & news at Statistics Globe.
I hate spam & you may opt out anytime: Privacy Policy.


Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.

Top