Module 18 – Handling Dates & Times Using dplyr & lubridate
Understanding how to manipulate date and time data is invaluable for any data analyst dealing with time-series data or any data set with temporal elements! Module 18 focuses on handling dates and times in R, utilizing the dplyr package along with the lubridate package. This module’s video lecture provides comprehensive guidance on manipulating and formatting date and time data. The R code examples demonstrate various operations, such as converting character strings to date and time objects, extracting components like year, month, day, hour, minute, and second, and performing rounding operations on date-time data. Exercises included in the module allow you to practice these important skills.
Video Lecture
Exercises
We’ll use the same data as in Module 17 for the exercises below (data attribution). You may download it here.
- Import the temperature data set
temp-data-bind-rows.csv
into R and store it in a tibble namedtemp_data3
. - Extract and create three new columns for the year, month, and day from the
date
column intemp_data3
usingyear()
,month()
, andday()
functions fromlubridate
. - Create a new column that rounds the date to by day using
round_date()
. - Group
temp_data3
by the new rounded date column and calculate the average temperature for each rounded day. - Format the date column to display only the year and month, and store this in a new column.
- Filter
temp_data3
to include only the data from January 2021. - Create a new column that calculates the difference in days from “2021-01-01” to the dates in the date column.
The solutions to these exercises can be found at the bottom of this page.
R Code of This Lecture
# install.packages("tidyverse") # Install tidyverse packages library("tidyverse") # Load tidyverse packages set.seed(289644) # Ensure reproducibility vec_date <- seq(ymd("2024-01-01"), # Create date vector ymd("2024-12-31"), # Sequence of dates by day by = "day") %>% sample(5) %>% # Sample 5 dates as.character() # Convert to character vec_time <- seq(from = 0 * 3600, # Create time vector to = 24 * 3600, # Sequence by 10 seconds by = 10)[-1] %>% seconds_to_period() %>% # Convert to Period { sprintf("%02d:%02d:%02d", # Format times hour(.), minute(.), second(.)) } %>% sample(5) # Sample 5 times tib_dt <- tibble(date = vec_date, # Create dates & times tibble time = vec_time) tib_dt # Print tibble tib_dt <- tib_dt %>% # Convert character to Date mutate(date_new = ymd(date)) # Apply ymd() tib_dt # Print updated tibble tib_dt <- tib_dt %>% # Convert character to Period mutate(time_new = hms(time)) # Apply hms() tib_dt # Print updated tibble tib_dt <- tib_dt %>% # Date & time to POSIXt mutate(date_time = ymd_hms(str_c(date, time))) # Apply ymd_hms() & str_c() tib_dt # Print updated tibble tib_dt <- tib_dt %>% # Extract date values mutate(year = year(date), # Extract year month = month(date), # Extract month day = day(date)) # Extract day tib_dt # Print updated tibble tib_dt <- tib_dt %>% # Extract time values mutate(hour = hour(date_time), # Extract hours minute = minute(date_time), # Extract minutes second = second(date_time)) # Extract seconds tib_dt # Print updated tibble tib_dt <- tib_dt %>% # Round dates & times mutate(round_date_hour = round_date(date_time, "hour"), # Round by hours round_date_month = round_date(date_time, "month")) # Round by months tib_dt # Print updated tibble tib_dt <- tib_dt %>% # Extract certain components mutate(extract_date_month = format(date_time, "%Y-%m")) # year & month tib_dt # Print updated tibble tib_dt <- tib_dt %>% # Convert character to Date mutate(date_new = ymd(date), # Apply ymd() time_new = hms(time), # Apply hms() date_time = ymd_hms(str_c(date, time)), # Apply ymd_hms() & str_c() year = year(date), # Extract year month = month(date), # Extract month day = day(date), # Extract day hour = hour(date_time), # Extract hours minute = minute(date_time), # Extract minutes second = second(date_time), # Extract seconds round_date_hour = round_date(date_time, "hour"), # Round by hours round_date_month = round_date(date_time, "month"), # Round by months extract_date_month = format(date_time, "%Y-%m")) # year & month tib_dt # Print updated tibble
Exercise Solutions
Below, you can find our solutions for the exercises of this module. Before we can jump into the exercises, we will install and load the tidyverse
packages.
install.packages("tidyverse") # Install tidyverse packages library(tidyverse) # Load tidyverse packages
With the tidyverse
packages installed and loaded, we can now proceed to the exercise solutions.
Exercise 1: Import the temperature data set temp-data-bind-rows.csv
into R and store it in a tibble named temp_data3
.
my_path = "path to CSV file on your computer" # Specify directory path temp_data3 <- read_csv(str_c(my_path,"temp-data-bind-rows.csv")) # Import temperature CSV file temp_data3 # Print tibble
We used the str_c()
function to combine the path and file names and parsed this combined string to the read_csv()
function to import the data set temp-data-bind-rows.
Exercise 2: Extract and create three new columns for the year, month, and day from the date
column in temp_data3
using year()
, month()
, and day()
functions from lubridate
.
temp_data3 <- temp_data3 %>% # Extract date values mutate(year = year(date), # Extract year month = month(date), # Extract month day = day(date)) # Extract day temp_data3 # Print updated tibble
In the example above, we used lubridate
‘s year()
, month()
, and day()
functions inside the mutate()
function of dplyr
to extract year, month, and day from the date
column of the tibble and stored the data in new columns year
, month
, and day
.
Exercise 3: Create a new column that rounds the date to by day using round_date()
.
temp_data3 <- temp_data3 %>% # Round date & times mutate(round_date = round_date(date,"day")) # Round date by day temp_data3 # Print updated tibble
Here we used the round_date()
function along with the mutate()
function to create a new column round_date
that contains the date rounded by days, effectively setting the time element to 00:00:00 (midnight). In other words, it standardizes the time component of each date to the start of that day.
Exercise 4: Group temp_data3
by the new rounded date column and calculate the average temperature for each rounded day.
temp_data3_grpd <- temp_data3 %>% # Group temperature data group_by(round_date) %>% # Group by round_date summarize(avg_temp = mean(temp, na.rm =TRUE)) # Calculate average temperature temp_data3_grpd # Print grouped tibble
In the above example, we used dplyr
‘s group_by()
function to group the data by the rows of round_date
column. Then we used the summarize()
function to create a new column wherein we calculated the average temperature by day, excluding the missing values via the na.rm = TRUE
argument. The grouped tibble was named temp_data3_grpd
.
Exercise 5: Format the date column to display only the year and month, and store this in a new column.
temp_data3 <- temp_data3 %>% # Extract certain components mutate(year_month = format(date,"%Y-%m")) # Extract year and month temp_data3 # Print updated tibble
Above, we used the format()
function in combination with the mutate()
function to create a new column year_month
which contains only the year and month data extracted from the date
column.
Exercise 6: Filter temp_data3
to include only the data from January 2021.
temp_data3 <- temp_data3 %>% # Filter data filter(month(date) == 01) # Filter by month temp_data3 # Print updated tibble
Here we used the month()
function inside the filter()
function to first extract all the months in the date
column of the data set and then matched that to the first month, i.e. January (01). That way, only the data for January was retained while other months were filtered out.
Exercise 7: Create a new column that calculates the difference in days from “2021-01-01” to the dates in the date column.
temp_data3 <- temp_data3 %>% # Find difference of dates mutate(diff_date = as.Date("2021-01-01") - as.Date(date)) # Find difference with respect to "2021-01-01" temp_data3 # Print updated tibble
In this question, we took the difference between the reference date and the date
column in temp_data3
. In this operation, converting the dates to the Date
class was crucial.
Solutions to these exercises were created in collaboration with Ifeanyi Idiaye and Cansu Kebabci. Thanks to them for their contribution!
Further Resources
- lubridate Documentation – lubridate
- Statistics Globe Article – Convert Character String with AM & PM to Date & Time in R
- Statistics Globe Article – Add & Subtract Months & Years to/from Date Object in R
- Statistics Globe Article – Get End of Month for Certain Date in R
- Statistics Globe Article – tz() Function of lubridate Package in R | Time Zone Component
.
You can access the course overview page, timetable, and table of contents by clicking here.