Reshape Data with Multiple Measure Columns from Long to Wide in R (3 Examples)

In this tutorial I’ll demonstrate how to convert a data set containing multiple measure variables from long to wide format using the R programming language.

The table of content is structured as follows:

Let’s dive right into it!

 

Add-On Packages & Example Data

We transformed the UsArrests data into a long format in the tutorial Reshape Data with Multiple Measure Columns from Wide to Long in R. In this tutorial, I will show how to back-transform the long formatted USArrests data to its original wide format via the tidyr and reshape2 packages.

As complementarities to the melt() and gather() functions, the dcast() and spread() functions will be demonstrated. The updated version of the spread() function, pivot_wider(), will also be shown in the third example.

Let’s first remember one way of converting the data to its long format. For other functions, see this tutorial.

install.packages("reshape2")
library(reshape2)
data("USArrests")
States<-rownames(USArrests)
USArrests<-cbind(States, USArrests, row.names=NULL)
 
USArrests_long<-melt(USArrests,
                     id.vars=c("States"), 
                     measure.vars=c("Murder", "Assault", "UrbanPop", "Rape"), 
                     value.name="Values", 
                     variable.name="Measures")
head(USArrests_long,10)
 
##         States Measures Values
## 1      Alabama   Murder   13.2
## 2       Alaska   Murder   10.0
## 3      Arizona   Murder    8.1
## 4     Arkansas   Murder    8.8
## 5   California   Murder    9.0
## 6     Colorado   Murder    7.9
## 7  Connecticut   Murder    3.3
## 8     Delaware   Murder    5.9
## 9      Florida   Murder   15.4
## 10     Georgia   Murder   17.4

Now it’s time to transform the data back to the wide format.

 

Example 1: Long to Wide Format (reshape2 Package – dcast() Function)

The dcast() function uses two main arguments, which refer to the dataset and the formula indicating the shape of the wide format. The left-hand side of the formula represents the id columns, whereas the right-hand side refers to the measurement names column to be assigned to the column names in the wide format. Additional id variables can be added to the formula via the “+” sign on the left-hand side.

In the USArrests_long dataset, we have one id variable, States, and one measurement names column, Measures. There is no need to mention the value column explicitly, as it is guessed by default. However, for illustrative purposes, the argument value.var is stated in the example below.

USArrests_wide1<-dcast(data=USArrests_long, formula=States ~ Measures, value.var="Values")
head(USArrests_wide1,10)
 
##         States Murder Assault UrbanPop Rape
## 1      Alabama   13.2     236       58 21.2
## 2       Alaska   10.0     263       48 44.5
## 3      Arizona    8.1     294       80 31.0
## 4     Arkansas    8.8     190       50 19.5
## 5   California    9.0     276       91 40.6
## 6     Colorado    7.9     204       78 38.7
## 7  Connecticut    3.3     110       77 11.1
## 8     Delaware    5.9     238       72 15.8
## 9      Florida   15.4     335       80 31.9
## 10     Georgia   17.4     211       60 25.8

The data can also have multiple value columns in the long format. Let’s assume that the same measurements were taken 10 years after and the values increased by 2%. Here is how the data would look like.

New_USArrests_long<-data.frame(USArrests_long[,1:2], 
                               Values_1973=USArrests_long$Values,
                               Values_1983=USArrests_long$Values*1.02)
head(New_USArrests_long, 10)
 
##         States Measures Values_1973 Values_1983
## 1      Alabama   Murder        13.2      13.464
## 2       Alaska   Murder        10.0      10.200
## 3      Arizona   Murder         8.1       8.262
## 4     Arkansas   Murder         8.8       8.976
## 5   California   Murder         9.0       9.180
## 6     Colorado   Murder         7.9       8.058
## 7  Connecticut   Murder         3.3       3.366
## 8     Delaware   Murder         5.9       6.018
## 9      Florida   Murder        15.4      15.708
## 10     Georgia   Murder        17.4      17.748

To convert multiple value columns from long to wide format, dcast.data.table() can be employed. As the name applies, data should be a data.table object in the use of dcast.data.table() function. See the tutorial data.table vs. data.frame in R for the comparison of data.frame and data.table objects.

For the implementation, the data.table package should be installed first. Then the transformed data by the setDT() function can be plugged into dcast.data.table().

install.packages("data.table")
library("data.table")
New_USArrests_wide<-dcast.data.table(data=setDT(New_USArrests_long), 
                                     formula=States ~ Measures, 
                                     value.var=c("Values_1973", "Values_1983"))
head(New_USArrests_wide, 10)
 
## Key: <States>
##          States Values_1973_Murder Values_1973_Assault Values_1973_UrbanPop
##          <char>              <num>               <num>                <num>
##  1:     Alabama               13.2                 236                   58
##  2:      Alaska               10.0                 263                   48
##  3:     Arizona                8.1                 294                   80
##  4:    Arkansas                8.8                 190                   50
##  5:  California                9.0                 276                   91
##  6:    Colorado                7.9                 204                   78
##  7: Connecticut                3.3                 110                   77
##  8:    Delaware                5.9                 238                   72
##  9:     Florida               15.4                 335                   80
## 10:     Georgia               17.4                 211                   60
 
##     Values_1973_Rape Values_1983_Murder Values_1983_Assault
##                <num>              <num>               <num>
##  1:             21.2             13.464              240.72
##  2:             44.5             10.200              268.26
##  3:             31.0              8.262              299.88
##  4:             19.5              8.976              193.80
##  5:             40.6              9.180              281.52
##  6:             38.7              8.058              208.08
##  7:             11.1              3.366              112.20
##  8:             15.8              6.018              242.76
##  9:             31.9             15.708              341.70
## 10:             25.8             17.748              215.22
 
##     Values_1983_UrbanPop Values_1983_Rape
##                    <num>            <num>
##  1:                59.16           21.624
##  2:                48.96           45.390
##  3:                81.60           31.620
##  4:                51.00           19.890
##  5:                92.82           41.412
##  6:                79.56           39.474
##  7:                78.54           11.322
##  8:                73.44           16.116
##  9:                81.60           32.538
## 10:                61.20           26.316

Another flexibility of the reshape2 package is that it also provides a version of dcast(), acast(), which can work on arrays and matrices.

 

Example 2: Long to Wide Format (tidyr Package – spread() Function)

The spread() function does not use a formula to indicate the data shape. Instead, it directly calls the measurement names column and value column via the key and value arguments, respectively. Here is an example of how spread() operates.

install.packages("tidyr")
library(tidyr)
USArrests_wide2<-spread(data=USArrests_long, key=Measures, value=Values)
head(USArrests_wide2,10)
 
##         States Murder Assault UrbanPop Rape
## 1      Alabama   13.2     236       58 21.2
## 2       Alaska   10.0     263       48 44.5
## 3      Arizona    8.1     294       80 31.0
## 4     Arkansas    8.8     190       50 19.5
## 5   California    9.0     276       91 40.6
## 6     Colorado    7.9     204       78 38.7
## 7  Connecticut    3.3     110       77 11.1
## 8     Delaware    5.9     238       72 15.8
## 9      Florida   15.4     335       80 31.9
## 10     Georgia   17.4     211       60 25.8

As seen above, there is no explicit expression for the id variables like in the case of dcast(). The unstated variables are automatically treated as id variables. Let’s see how the same code line works for data in a different setting.

USArrests_long2<-melt(USArrests,
                     id.vars=c("States", "UrbanPop"), 
                     measure.vars=c("Murder", "Assault", "Rape"), 
                     value.name="Values", 
                     variable.name="Measures")
head(USArrests_long2,10)
 
##         States UrbanPop Measures Values
## 1      Alabama       58   Murder   13.2
## 2       Alaska       48   Murder   10.0
## 3      Arizona       80   Murder    8.1
## 4     Arkansas       50   Murder    8.8
## 5   California       91   Murder    9.0
## 6     Colorado       78   Murder    7.9
## 7  Connecticut       77   Murder    3.3
## 8     Delaware       72   Murder    5.9
## 9      Florida       80   Murder   15.4
## 10     Georgia       60   Murder   17.4
 
USArrests_wide3<-spread(data=USArrests_long2, key=Measures, value=Values)
head(USArrests_wide3,10)
 
##         States UrbanPop Murder Assault Rape
## 1      Alabama       58   13.2     236 21.2
## 2       Alaska       48   10.0     263 44.5
## 3      Arizona       80    8.1     294 31.0
## 4     Arkansas       50    8.8     190 19.5
## 5   California       91    9.0     276 40.6
## 6     Colorado       78    7.9     204 38.7
## 7  Connecticut       77    3.3     110 11.1
## 8     Delaware       72    5.9     238 15.8
## 9      Florida       80   15.4     335 31.9
## 10     Georgia       60   17.4     211 25.8

The spread() function does not have the flexibility to work with multiple value columns. However, those who might be interested can check the multiple_spread() function shared by Dan Sullivan on the RStudio (Posit) community website.

 

Example 3: Long to Wide Format (tidyr Package – pivot_wider() Function)

The pivot() function is maybe the most intuitive and flexible one among the options. It uses 4 main arguments: data, names_from, values_from, and id_cols.

As the names apply, the data, names_from, values_from, and id_cols arguments refer to the data, measurement names column, value column, and id column, respectively. If id_cols is not stated, it is selected from the column(s) which are not specified by the names_from and values_from arguments.

Let’s see how it works with an example.

USArrests_wide4<-pivot_wider(data=USArrests_long2, names_from=Measures, values_from=Values)
head(USArrests_wide4, 10)
 
## # A tibble: 10 × 5
##    States      UrbanPop Murder Assault  Rape
##    <chr>          <int>  <dbl>   <dbl> <dbl>
##  1 Alabama           58   13.2     236  21.2
##  2 Alaska            48   10       263  44.5
##  3 Arizona           80    8.1     294  31  
##  4 Arkansas          50    8.8     190  19.5
##  5 California        91    9       276  40.6
##  6 Colorado          78    7.9     204  38.7
##  7 Connecticut       77    3.3     110  11.1
##  8 Delaware          72    5.9     238  15.8
##  9 Florida           80   15.4     335  31.9
## 10 Georgia           60   17.4     211  25.8

As one can see, States and UrbanPop are the id columns. If the id column is to be reduced to States only, the computation should be as follows.

USArrests_wide5<-pivot_wider(data=USArrests_long2,
                             names_from=Measures,
                             values_from=Values,
                             id_cols=States)
head(USArrests_wide5, 10)
 
## # A tibble: 10 × 4
##    States      Murder Assault  Rape
##    <chr>        <dbl>   <dbl> <dbl>
##  1 Alabama       13.2     236  21.2
##  2 Alaska        10       263  44.5
##  3 Arizona        8.1     294  31  
##  4 Arkansas       8.8     190  19.5
##  5 California     9       276  40.6
##  6 Colorado       7.9     204  38.7
##  7 Connecticut    3.3     110  11.1
##  8 Delaware       5.9     238  15.8
##  9 Florida       15.4     335  31.9
## 10 Georgia       17.4     211  25.8

The pivot_wider() function also enables working with multiple value columns. It is simply implemented by plugging a vector of value columns in the function. See the example below.

USArrests_wide5<-pivot_wider(data=New_USArrests_long,
                             names_from=Measures, 
                             values_from=c(Values_1973, Values_1983),
                             id_cols=States)
head(USArrests_wide5, 10)
 
## # A tibble: 10 × 9
##    States      Values_…¹ Value…² Value…³ Value…⁴ Value…⁵ Value…⁶ Value…⁷ Value…⁸
##    <chr>           <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
##  1 Alabama          13.2     236      58    21.2   13.5     241.    59.2    21.6
##  2 Alaska           10       263      48    44.5   10.2     268.    49.0    45.4
##  3 Arizona           8.1     294      80    31      8.26    300.    81.6    31.6
##  4 Arkansas          8.8     190      50    19.5    8.98    194.    51      19.9
##  5 California        9       276      91    40.6    9.18    282.    92.8    41.4
##  6 Colorado          7.9     204      78    38.7    8.06    208.    79.6    39.5
##  7 Connecticut       3.3     110      77    11.1    3.37    112.    78.5    11.3
##  8 Delaware          5.9     238      72    15.8    6.02    243.    73.4    16.1
##  9 Florida          15.4     335      80    31.9   15.7     342.    81.6    32.5
## 10 Georgia          17.4     211      60    25.8   17.7     215.    61.2    26.3
## # … with abbreviated variable names ¹​Values_1973_Murder, ²​Values_1973_Assault,
## #   ³​Values_1973_UrbanPop, ⁴​Values_1973_Rape, ⁵​Values_1983_Murder,
## #   ⁶​Values_1983_Assault, ⁷​Values_1983_UrbanPop, ⁸​Values_1983_Rape

Video, Further Resources & Summary

Do you need more explanations on this topic? Then you might check out the following video from the Statistics Globe YouTube channel.

 

The YouTube video will be added soon.

 

Furthermore, you could have a look at some of the other tutorials on Statistics Globe:

This article has demonstrated how to transform data frames from long to wide in the R programming language. If you have further questions, you may leave a comment below.

 

Rana Cansu Kebabci Statistician & Data Scientist

This page was created in collaboration with Cansu Kebabci. You might have a look at Cansu’s author page to get more information about her academic background and the other articles she has written for Statistics Globe.

 

Subscribe to the Statistics Globe Newsletter

Get regular updates on the latest tutorials, offers & news at Statistics Globe.
I hate spam & you may opt out anytime: Privacy Policy.


Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.

Top