Module 15 – Reshaping Data Using dplyr & tidyr

Module 15 covers reshaping data sets using dplyr and tidyr in R. It focuses on transforming data from wide formats (each variable in its own column) to long formats (stacked with one column for variables and one for values), and vice versa. The video lecture provides a detailed exploration of the techniques pivot_longer() and pivot_wider() for data reformatting to restructure your data to better analyze and extract meaningful insights to meet various analytical needs. To reinforce your understanding, the module includes practical exercises for hands-on practice in applying these reshaping techniques.

Video Lecture

Exercises

  1. Import the sleep-productivity-data.csv file that was created in Module 14 into R and store it in a tibble named sleep_productivity.
  2. Use pivot_longer() to convert the sleep_productivity data set from wide to long format. The long format should have a column for the day, week number, and another for the type of measurement (e.g., sleep hours, productivity score), with corresponding values.
  3. Convert the long format data set back to wide format using pivot_wider().

The solutions to these exercises can be found at the bottom of this page.

Data & R Code of This Lecture

You may download the data set used in this lecture here.

# install.packages("tidyverse")                   # Install tidyverse packages
library("tidyverse")                              # Load tidyverse packages
 
my_path <- "D:/Dropbox/Jock/Data Sets/dplyr Course/"  # Specify directory path
 
team_cf <- read_csv(str_c(my_path,                # Import CSV file
                          "Team-Coffee-Freshness-Data.csv"))
team_cf                                           # Print tibble
 
team_cf_long <- team_cf %>%                       # Convert data to long format
  pivot_longer(cols = c("cups", "frlvl"))
team_cf_long                                      # Print data in long format
 
team_cf_wide <- team_cf_long %>%                  # Convert data to wide format
  pivot_wider()
team_cf_wide                                      # Print data in wide format

Exercise Solutions

Below, you can find our solutions for the exercises of this module. Before beginning the exercises, we will install and load the tidyverse and tidyr packages. The tidyverse package enables us to use the dplyr functions while the tidyr package enables us to use the tidyr functions for reshaping.

install.packages(c("tidyverse", "tidyr"))	                                 # Install tidyverse and tidyr packages
 
library(tidyverse)	                                                         # Load tidyverse packages
 
library(tidyr)		                                                         # Load tidyr package

With the tidyverse and tidyr packages loaded, we can now proceed to the solutions of the exercises.

Exercise 1: Import the sleep-productivity-data.csv file that was created in Module 14 into R and store it in a tibble named sleep_productivity.

data_path <- "path to sleep-productivity-data.csv"                               # Specify directory path
 
sleep_productivity <- read_csv(str_c(data_path, "sleep-productivity-data.csv"))  # Import CSV file
 
sleep_productivity                                                               # Print data

To import the sleep-productivity-data, we first specified the path of the CSV file on our computer and then passed that path with the file name to the read_csv() function in combination with the str_c() function. The imported CSV file is stored in the variable sleep_productivity.

Exercise 2: Use pivot_longer() to convert the sleep_productivity data set from wide to long format. The long format should have a column for the day, week number, and another for the type of measurement (e.g., sleep hours, productivity score), with corresponding values.

sleep_productivity_long <- sleep_productivity %>%                                # Convert data to long format
  pivot_longer(cols = c("sleep_hours","productivity_score"))
 
sleep_productivity_long		                                                 # Print data in long format

In the above solution, we used the pivot_longer() function to convert the data to long format with the sleep_hours and productivity_score categories in a column called name and their respective values in a column called value.

Exercise 3: Convert the long format data set back to wide format using pivot_wider().

sleep_productivity_wide <- sleep_productivity_long %>%	                         # Convert data to wide format 
  pivot_wider()
 
sleep_productivity_wide		                                                 # Print data in wide format

Here, we used the pivot_wider() function to convert the long-formatted data back into a wide format, which is the original structure of the tibble.

Solutions to these exercises were created in collaboration with Ifeanyi Idiaye and Cansu Kebabci. Thanks to them for their contribution!

Further Resources

 

Move to Previous Module Button

.

Move to Next Module Button

 

You can access the course overview page, timetable, and table of contents by clicking here.

 

Subscribe to the Statistics Globe Newsletter

Get regular updates on the latest tutorials, offers & news at Statistics Globe.
I hate spam & you may opt out anytime: Privacy Policy.


Top