Reshape Data with Multiple Measure Columns from Long to Wide in R (3 Examples)
In this tutorial I’ll demonstrate how to convert a data set containing multiple measure variables from long to wide format using the R programming language.
The table of content is structured as follows:
Let’s dive right into it!
Add-On Packages & Example Data
We transformed the UsArrests data into a long format in the tutorial Reshape Data with Multiple Measure Columns from Wide to Long in R. In this tutorial, I will show how to back-transform the long formatted USArrests data to its original wide format via the tidyr and reshape2 packages.
As complementarities to the melt() and gather() functions, the dcast() and spread() functions will be demonstrated. The updated version of the spread() function, pivot_wider(), will also be shown in the third example.
Let’s first remember one way of converting the data to its long format. For other functions, see this tutorial.
install.packages("reshape2") library(reshape2) data("USArrests") States<-rownames(USArrests) USArrests<-cbind(States, USArrests, row.names=NULL) USArrests_long<-melt(USArrests, id.vars=c("States"), measure.vars=c("Murder", "Assault", "UrbanPop", "Rape"), value.name="Values", variable.name="Measures") head(USArrests_long,10) ## States Measures Values ## 1 Alabama Murder 13.2 ## 2 Alaska Murder 10.0 ## 3 Arizona Murder 8.1 ## 4 Arkansas Murder 8.8 ## 5 California Murder 9.0 ## 6 Colorado Murder 7.9 ## 7 Connecticut Murder 3.3 ## 8 Delaware Murder 5.9 ## 9 Florida Murder 15.4 ## 10 Georgia Murder 17.4
Now it’s time to transform the data back to the wide format.
Example 1: Long to Wide Format (reshape2 Package – dcast() Function)
The dcast() function uses two main arguments, which refer to the dataset and the formula indicating the shape of the wide format. The left-hand side of the formula represents the id columns, whereas the right-hand side refers to the measurement names column to be assigned to the column names in the wide format. Additional id variables can be added to the formula via the “+” sign on the left-hand side.
In the USArrests_long dataset, we have one id variable, States, and one measurement names column, Measures. There is no need to mention the value column explicitly, as it is guessed by default. However, for illustrative purposes, the argument value.var is stated in the example below.
USArrests_wide1<-dcast(data=USArrests_long, formula=States ~ Measures, value.var="Values") head(USArrests_wide1,10) ## States Murder Assault UrbanPop Rape ## 1 Alabama 13.2 236 58 21.2 ## 2 Alaska 10.0 263 48 44.5 ## 3 Arizona 8.1 294 80 31.0 ## 4 Arkansas 8.8 190 50 19.5 ## 5 California 9.0 276 91 40.6 ## 6 Colorado 7.9 204 78 38.7 ## 7 Connecticut 3.3 110 77 11.1 ## 8 Delaware 5.9 238 72 15.8 ## 9 Florida 15.4 335 80 31.9 ## 10 Georgia 17.4 211 60 25.8
The data can also have multiple value columns in the long format. Let’s assume that the same measurements were taken 10 years after and the values increased by 2%. Here is how the data would look like.
New_USArrests_long<-data.frame(USArrests_long[,1:2], Values_1973=USArrests_long$Values, Values_1983=USArrests_long$Values*1.02) head(New_USArrests_long, 10) ## States Measures Values_1973 Values_1983 ## 1 Alabama Murder 13.2 13.464 ## 2 Alaska Murder 10.0 10.200 ## 3 Arizona Murder 8.1 8.262 ## 4 Arkansas Murder 8.8 8.976 ## 5 California Murder 9.0 9.180 ## 6 Colorado Murder 7.9 8.058 ## 7 Connecticut Murder 3.3 3.366 ## 8 Delaware Murder 5.9 6.018 ## 9 Florida Murder 15.4 15.708 ## 10 Georgia Murder 17.4 17.748
To convert multiple value columns from long to wide format, dcast.data.table() can be employed. As the name applies, data should be a data.table object in the use of dcast.data.table() function. See the tutorial data.table vs. data.frame in R for the comparison of data.frame and data.table objects.
For the implementation, the data.table package should be installed first. Then the transformed data by the setDT() function can be plugged into dcast.data.table().
install.packages("data.table") library("data.table") New_USArrests_wide<-dcast.data.table(data=setDT(New_USArrests_long), formula=States ~ Measures, value.var=c("Values_1973", "Values_1983")) head(New_USArrests_wide, 10) ## Key: <States> ## States Values_1973_Murder Values_1973_Assault Values_1973_UrbanPop ## <char> <num> <num> <num> ## 1: Alabama 13.2 236 58 ## 2: Alaska 10.0 263 48 ## 3: Arizona 8.1 294 80 ## 4: Arkansas 8.8 190 50 ## 5: California 9.0 276 91 ## 6: Colorado 7.9 204 78 ## 7: Connecticut 3.3 110 77 ## 8: Delaware 5.9 238 72 ## 9: Florida 15.4 335 80 ## 10: Georgia 17.4 211 60 ## Values_1973_Rape Values_1983_Murder Values_1983_Assault ## <num> <num> <num> ## 1: 21.2 13.464 240.72 ## 2: 44.5 10.200 268.26 ## 3: 31.0 8.262 299.88 ## 4: 19.5 8.976 193.80 ## 5: 40.6 9.180 281.52 ## 6: 38.7 8.058 208.08 ## 7: 11.1 3.366 112.20 ## 8: 15.8 6.018 242.76 ## 9: 31.9 15.708 341.70 ## 10: 25.8 17.748 215.22 ## Values_1983_UrbanPop Values_1983_Rape ## <num> <num> ## 1: 59.16 21.624 ## 2: 48.96 45.390 ## 3: 81.60 31.620 ## 4: 51.00 19.890 ## 5: 92.82 41.412 ## 6: 79.56 39.474 ## 7: 78.54 11.322 ## 8: 73.44 16.116 ## 9: 81.60 32.538 ## 10: 61.20 26.316
Another flexibility of the reshape2 package is that it also provides a version of dcast(), acast(), which can work on arrays and matrices.
Example 2: Long to Wide Format (tidyr Package – spread() Function)
The spread() function does not use a formula to indicate the data shape. Instead, it directly calls the measurement names column and value column via the key and value arguments, respectively. Here is an example of how spread() operates.
install.packages("tidyr") library(tidyr) USArrests_wide2<-spread(data=USArrests_long, key=Measures, value=Values) head(USArrests_wide2,10) ## States Murder Assault UrbanPop Rape ## 1 Alabama 13.2 236 58 21.2 ## 2 Alaska 10.0 263 48 44.5 ## 3 Arizona 8.1 294 80 31.0 ## 4 Arkansas 8.8 190 50 19.5 ## 5 California 9.0 276 91 40.6 ## 6 Colorado 7.9 204 78 38.7 ## 7 Connecticut 3.3 110 77 11.1 ## 8 Delaware 5.9 238 72 15.8 ## 9 Florida 15.4 335 80 31.9 ## 10 Georgia 17.4 211 60 25.8
As seen above, there is no explicit expression for the id variables like in the case of dcast(). The unstated variables are automatically treated as id variables. Let’s see how the same code line works for data in a different setting.
USArrests_long2<-melt(USArrests, id.vars=c("States", "UrbanPop"), measure.vars=c("Murder", "Assault", "Rape"), value.name="Values", variable.name="Measures") head(USArrests_long2,10) ## States UrbanPop Measures Values ## 1 Alabama 58 Murder 13.2 ## 2 Alaska 48 Murder 10.0 ## 3 Arizona 80 Murder 8.1 ## 4 Arkansas 50 Murder 8.8 ## 5 California 91 Murder 9.0 ## 6 Colorado 78 Murder 7.9 ## 7 Connecticut 77 Murder 3.3 ## 8 Delaware 72 Murder 5.9 ## 9 Florida 80 Murder 15.4 ## 10 Georgia 60 Murder 17.4 USArrests_wide3<-spread(data=USArrests_long2, key=Measures, value=Values) head(USArrests_wide3,10) ## States UrbanPop Murder Assault Rape ## 1 Alabama 58 13.2 236 21.2 ## 2 Alaska 48 10.0 263 44.5 ## 3 Arizona 80 8.1 294 31.0 ## 4 Arkansas 50 8.8 190 19.5 ## 5 California 91 9.0 276 40.6 ## 6 Colorado 78 7.9 204 38.7 ## 7 Connecticut 77 3.3 110 11.1 ## 8 Delaware 72 5.9 238 15.8 ## 9 Florida 80 15.4 335 31.9 ## 10 Georgia 60 17.4 211 25.8
The spread() function does not have the flexibility to work with multiple value columns. However, those who might be interested can check the multiple_spread() function shared by Dan Sullivan on the RStudio (Posit) community website.
Example 3: Long to Wide Format (tidyr Package – pivot_wider() Function)
The pivot() function is maybe the most intuitive and flexible one among the options. It uses 4 main arguments: data, names_from, values_from, and id_cols.
As the names apply, the data, names_from, values_from, and id_cols arguments refer to the data, measurement names column, value column, and id column, respectively. If id_cols is not stated, it is selected from the column(s) which are not specified by the names_from and values_from arguments.
Let’s see how it works with an example.
USArrests_wide4<-pivot_wider(data=USArrests_long2, names_from=Measures, values_from=Values) head(USArrests_wide4, 10) ## # A tibble: 10 × 5 ## States UrbanPop Murder Assault Rape ## <chr> <int> <dbl> <dbl> <dbl> ## 1 Alabama 58 13.2 236 21.2 ## 2 Alaska 48 10 263 44.5 ## 3 Arizona 80 8.1 294 31 ## 4 Arkansas 50 8.8 190 19.5 ## 5 California 91 9 276 40.6 ## 6 Colorado 78 7.9 204 38.7 ## 7 Connecticut 77 3.3 110 11.1 ## 8 Delaware 72 5.9 238 15.8 ## 9 Florida 80 15.4 335 31.9 ## 10 Georgia 60 17.4 211 25.8
As one can see, States and UrbanPop are the id columns. If the id column is to be reduced to States only, the computation should be as follows.
USArrests_wide5<-pivot_wider(data=USArrests_long2, names_from=Measures, values_from=Values, id_cols=States) head(USArrests_wide5, 10) ## # A tibble: 10 × 4 ## States Murder Assault Rape ## <chr> <dbl> <dbl> <dbl> ## 1 Alabama 13.2 236 21.2 ## 2 Alaska 10 263 44.5 ## 3 Arizona 8.1 294 31 ## 4 Arkansas 8.8 190 19.5 ## 5 California 9 276 40.6 ## 6 Colorado 7.9 204 38.7 ## 7 Connecticut 3.3 110 11.1 ## 8 Delaware 5.9 238 15.8 ## 9 Florida 15.4 335 31.9 ## 10 Georgia 17.4 211 25.8
The pivot_wider() function also enables working with multiple value columns. It is simply implemented by plugging a vector of value columns in the function. See the example below.
USArrests_wide5<-pivot_wider(data=New_USArrests_long, names_from=Measures, values_from=c(Values_1973, Values_1983), id_cols=States) head(USArrests_wide5, 10) ## # A tibble: 10 × 9 ## States Values_…¹ Value…² Value…³ Value…⁴ Value…⁵ Value…⁶ Value…⁷ Value…⁸ ## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 Alabama 13.2 236 58 21.2 13.5 241. 59.2 21.6 ## 2 Alaska 10 263 48 44.5 10.2 268. 49.0 45.4 ## 3 Arizona 8.1 294 80 31 8.26 300. 81.6 31.6 ## 4 Arkansas 8.8 190 50 19.5 8.98 194. 51 19.9 ## 5 California 9 276 91 40.6 9.18 282. 92.8 41.4 ## 6 Colorado 7.9 204 78 38.7 8.06 208. 79.6 39.5 ## 7 Connecticut 3.3 110 77 11.1 3.37 112. 78.5 11.3 ## 8 Delaware 5.9 238 72 15.8 6.02 243. 73.4 16.1 ## 9 Florida 15.4 335 80 31.9 15.7 342. 81.6 32.5 ## 10 Georgia 17.4 211 60 25.8 17.7 215. 61.2 26.3 ## # … with abbreviated variable names ¹Values_1973_Murder, ²Values_1973_Assault, ## # ³Values_1973_UrbanPop, ⁴Values_1973_Rape, ⁵Values_1983_Murder, ## # ⁶Values_1983_Assault, ⁷Values_1983_UrbanPop, ⁸Values_1983_Rape
Video, Further Resources & Summary
Do you need more explanations on this topic? Then you might check out the following video from the Statistics Globe YouTube channel.
The YouTube video will be added soon.
Furthermore, you could have a look at some of the other tutorials on Statistics Globe:
- Introduction to the R Programming Language
- Reshape Data with Multiple Measure Columns from Wide to Long Format in R
- Reshape Data Frame from Wide to Long Format in R
- data.table vs. data.frame in R
- data.table Package in R
This article has demonstrated how to transform data frames from long to wide in the R programming language. If you have further questions, you may leave a comment below.
This page was created in collaboration with Cansu Kebabci. You might have a look at Cansu’s author page to get more information about her academic background and the other articles she has written for Statistics Globe.
Statistics Globe Newsletter