Module 15 – Reshaping Data Using dplyr & tidyr
Module 15 covers reshaping data sets using dplyr and tidyr in R. It focuses on transforming data from wide formats (each variable in its own column) to long formats (stacked with one column for variables and one for values), and vice versa. The video lecture provides a detailed exploration of the techniques pivot_longer()
and pivot_wider()
for data reformatting to restructure your data to better analyze and extract meaningful insights to meet various analytical needs. To reinforce your understanding, the module includes practical exercises for hands-on practice in applying these reshaping techniques.
Video Lecture
Exercises
- Import the
sleep-productivity-data.csv
file that was created in Module 14 into R and store it in a tibble namedsleep_productivity
. - Use
pivot_longer()
to convert thesleep_productivity
data set from wide to long format. The long format should have a column for the day, week number, and another for the type of measurement (e.g., sleep hours, productivity score), with corresponding values. - Convert the long format data set back to wide format using
pivot_wider()
.
The solutions to these exercises can be found at the bottom of this page.
Data & R Code of This Lecture
You may download the data set used in this lecture here.
# install.packages("tidyverse") # Install tidyverse packages library("tidyverse") # Load tidyverse packages my_path <- "D:/Dropbox/Jock/Data Sets/dplyr Course/" # Specify directory path team_cf <- read_csv(str_c(my_path, # Import CSV file "Team-Coffee-Freshness-Data.csv")) team_cf # Print tibble team_cf_long <- team_cf %>% # Convert data to long format pivot_longer(cols = c("cups", "frlvl")) team_cf_long # Print data in long format team_cf_wide <- team_cf_long %>% # Convert data to wide format pivot_wider() team_cf_wide # Print data in wide format
Exercise Solutions
Below, you can find our solutions for the exercises of this module. Before beginning the exercises, we will install and load the tidyverse
and tidyr
packages. The tidyverse
package enables us to use the dplyr
functions while the tidyr
package enables us to use the tidyr
functions for reshaping.
install.packages(c("tidyverse", "tidyr")) # Install tidyverse and tidyr packages library(tidyverse) # Load tidyverse packages library(tidyr) # Load tidyr package
With the tidyverse
and tidyr
packages loaded, we can now proceed to the solutions of the exercises.
Exercise 1: Import the sleep-productivity-data.csv
file that was created in Module 14 into R and store it in a tibble named sleep_productivity
.
data_path <- "path to sleep-productivity-data.csv" # Specify directory path sleep_productivity <- read_csv(str_c(data_path, "sleep-productivity-data.csv")) # Import CSV file sleep_productivity # Print data
To import the sleep-productivity-data
, we first specified the path of the CSV file on our computer and then passed that path with the file name to the read_csv()
function in combination with the str_c()
function. The imported CSV file is stored in the variable sleep_productivity
.
Exercise 2: Use pivot_longer()
to convert the sleep_productivity
data set from wide to long format. The long format should have a column for the day, week number, and another for the type of measurement (e.g., sleep hours, productivity score), with corresponding values.
sleep_productivity_long <- sleep_productivity %>% # Convert data to long format pivot_longer(cols = c("sleep_hours","productivity_score")) sleep_productivity_long # Print data in long format
In the above solution, we used the pivot_longer()
function to convert the data to long format with the sleep_hours
and productivity_score
categories in a column called name
and their respective values in a column called value
.
Exercise 3: Convert the long format data set back to wide format using pivot_wider()
.
sleep_productivity_wide <- sleep_productivity_long %>% # Convert data to wide format pivot_wider() sleep_productivity_wide # Print data in wide format
Here, we used the pivot_wider()
function to convert the long-formatted data back into a wide format, which is the original structure of the tibble.
Solutions to these exercises were created in collaboration with Ifeanyi Idiaye and Cansu Kebabci. Thanks to them for their contribution!
Further Resources
- tidyr Documentation – Pivot data from wide to long
- tidyr Documentation – Pivot data from long to wide
- UC Business Analytics R Programming Guide – Reshaping Your Data with tidyr
.
You can access the course overview page, timetable, and table of contents by clicking here.