Module 18 – Handling Dates & Times Using dplyr & lubridate

Understanding how to manipulate date and time data is invaluable for any data analyst dealing with time-series data or any data set with temporal elements! Module 18 focuses on handling dates and times in R, utilizing the dplyr package along with the lubridate package. This module’s video lecture provides comprehensive guidance on manipulating and formatting date and time data. The R code examples demonstrate various operations, such as converting character strings to date and time objects, extracting components like year, month, day, hour, minute, and second, and performing rounding operations on date-time data. Exercises included in the module allow you to practice these important skills.

Video Lecture

Exercises

We’ll use the same data as in Module 17 for the exercises below (data attribution). You may download it here.

  1. Import the temperature data set temp-data-bind-rows.csv into R and store it in a tibble named temp_data3.
  2. Extract and create three new columns for the year, month, and day from the date column in temp_data3 using year(), month(), and day() functions from lubridate.
  3. Create a new column that rounds the date to by day using round_date().
  4. Group temp_data3 by the new rounded date column and calculate the average temperature for each rounded day.
  5. Format the date column to display only the year and month, and store this in a new column.
  6. Filter temp_data3 to include only the data from January 2021.
  7. Create a new column that calculates the difference in days from “2021-01-01” to the dates in the date column.

The solutions to these exercises can be found at the bottom of this page.

R Code of This Lecture

# install.packages("tidyverse")                   # Install tidyverse packages
library("tidyverse")                              # Load tidyverse packages
 
set.seed(289644)                                  # Ensure reproducibility
 
vec_date <- seq(ymd("2024-01-01"),                # Create date vector
                ymd("2024-12-31"),                # Sequence of dates by day
                by = "day") %>% 
  sample(5) %>%                                   # Sample 5 dates
  as.character()                                  # Convert to character
 
vec_time <- seq(from = 0 * 3600,                  # Create time vector
                to = 24 * 3600,                   # Sequence by 10 seconds
                by = 10)[-1] %>% 
  seconds_to_period() %>%                         # Convert to Period
  { sprintf("%02d:%02d:%02d",                     # Format times
            hour(.),
            minute(.),
            second(.)) } %>% 
  sample(5)                                       # Sample 5 times
 
tib_dt <- tibble(date = vec_date,                 # Create dates & times tibble
                 time = vec_time)
tib_dt                                            # Print tibble
 
tib_dt <- tib_dt %>%                              # Convert character to Date
  mutate(date_new = ymd(date))                    # Apply ymd()
tib_dt                                            # Print updated tibble
 
tib_dt <- tib_dt %>%                              # Convert character to Period
  mutate(time_new = hms(time))                    # Apply hms()
tib_dt                                            # Print updated tibble
 
tib_dt <- tib_dt %>%                              # Date & time to POSIXt
  mutate(date_time = ymd_hms(str_c(date, time)))  # Apply ymd_hms() & str_c()
tib_dt                                            # Print updated tibble
 
tib_dt <- tib_dt %>%                              # Extract date values
  mutate(year = year(date),                       # Extract year
         month = month(date),                     # Extract month
         day = day(date))                         # Extract day
tib_dt                                            # Print updated tibble
 
tib_dt <- tib_dt %>%                              # Extract time values
  mutate(hour = hour(date_time),                  # Extract hours
         minute = minute(date_time),              # Extract minutes
         second = second(date_time))              # Extract seconds
tib_dt                                            # Print updated tibble
 
tib_dt <- tib_dt %>%                              # Round dates & times
  mutate(round_date_hour = round_date(date_time, "hour"),   # Round by hours
         round_date_month = round_date(date_time, "month")) # Round by months
tib_dt                                            # Print updated tibble
 
tib_dt <- tib_dt %>%                              # Extract certain components
  mutate(extract_date_month = format(date_time, "%Y-%m")) # year & month
tib_dt                                            # Print updated tibble
 
tib_dt <- tib_dt %>%                              # Convert character to Date
  mutate(date_new = ymd(date),                    # Apply ymd()
         time_new = hms(time),                    # Apply hms()
         date_time = ymd_hms(str_c(date, time)),  # Apply ymd_hms() & str_c()
         year = year(date),                       # Extract year
         month = month(date),                     # Extract month
         day = day(date),                         # Extract day
         hour = hour(date_time),                  # Extract hours
         minute = minute(date_time),              # Extract minutes
         second = second(date_time),              # Extract seconds
         round_date_hour = round_date(date_time, "hour"),   # Round by hours
         round_date_month = round_date(date_time, "month"), # Round by months
         extract_date_month = format(date_time, "%Y-%m"))   # year & month
tib_dt                                            # Print updated tibble

Exercise Solutions

Below, you can find our solutions for the exercises of this module. Before we can jump into the exercises, we will install and load the tidyverse packages.

install.packages("tidyverse")	                                        # Install tidyverse packages
library(tidyverse)	                                                # Load tidyverse packages

With the tidyverse packages installed and loaded, we can now proceed to the exercise solutions.

Exercise 1: Import the temperature data set temp-data-bind-rows.csv into R and store it in a tibble named temp_data3.

my_path = "path to CSV file on your computer"		                # Specify directory path
 
temp_data3 <- read_csv(str_c(my_path,"temp-data-bind-rows.csv"))	# Import temperature CSV file
 
temp_data3	                                                        # Print tibble

We used the str_c() function to combine the path and file names and parsed this combined string to the read_csv() function to import the data set temp-data-bind-rows.

Exercise 2: Extract and create three new columns for the year, month, and day from the date column in temp_data3 using year(), month(), and day() functions from lubridate.

temp_data3 <- temp_data3 %>%	                                        # Extract date values 
  mutate(year = year(date),	                                        # Extract year
         month = month(date),	                                        # Extract month
         day = day(date))	                                        # Extract day
 
temp_data3	                                                        # Print updated tibble

In the example above, we used lubridate‘s year(), month(), and day() functions inside the mutate() function of dplyr to extract year, month, and day from the date column of the tibble and stored the data in new columns year, month, and day.

Exercise 3: Create a new column that rounds the date to by day using round_date().

temp_data3 <-  temp_data3 %>%	                                        # Round date & times
  mutate(round_date = round_date(date,"day"))	                        # Round date by day
 
temp_data3	                                                        # Print updated tibble

Here we used the round_date() function along with the mutate() function to create a new column round_date that contains the date rounded by days, effectively setting the time element to 00:00:00 (midnight). In other words, it standardizes the time component of each date to the start of that day.

Exercise 4: Group temp_data3 by the new rounded date column and calculate the average temperature for each rounded day.

temp_data3_grpd <- temp_data3 %>%	                                # Group temperature data 
  group_by(round_date) %>% 	                                        # Group by round_date
  summarize(avg_temp = mean(temp, na.rm =TRUE))	                        # Calculate average temperature
 
temp_data3_grpd	                                                        # Print grouped tibble

In the above example, we used dplyr‘s group_by() function to group the data by the rows of round_date column. Then we used the summarize() function to create a new column wherein we calculated the average temperature by day, excluding the missing values via the na.rm = TRUE argument. The grouped tibble was named temp_data3_grpd.

Exercise 5: Format the date column to display only the year and month, and store this in a new column.

temp_data3 <- temp_data3 %>%	                                        # Extract certain components 
  mutate(year_month = format(date,"%Y-%m"))	                        # Extract year and month
 
temp_data3	                                                        # Print updated tibble

Above, we used the format() function in combination with the mutate() function to create a new column year_month which contains only the year and month data extracted from the date column.

Exercise 6: Filter temp_data3 to include only the data from January 2021.

temp_data3 <- temp_data3 %>%	                                        # Filter data 
  filter(month(date) == 01)	                                        # Filter by month
 
temp_data3	                                                        # Print updated tibble

Here we used the month() function inside the filter() function to first extract all the months in the date column of the data set and then matched that to the first month, i.e. January (01). That way, only the data for January was retained while other months were filtered out.

Exercise 7: Create a new column that calculates the difference in days from “2021-01-01” to the dates in the date column.

temp_data3 <- temp_data3 %>%                                            # Find difference of dates                        
  mutate(diff_date = as.Date("2021-01-01") - as.Date(date))             # Find difference with respect to "2021-01-01" 
 
temp_data3                                                              # Print updated tibble

In this question, we took the difference between the reference date and the date column in temp_data3. In this operation, converting the dates to the Date class was crucial.

Solutions to these exercises were created in collaboration with Ifeanyi Idiaye and Cansu Kebabci. Thanks to them for their contribution!

Further Resources

 

Move to Previous Module Button

.

Move to Next Module Button

 

You can access the course overview page, timetable, and table of contents by clicking here.

 

Subscribe to the Statistics Globe Newsletter

Get regular updates on the latest tutorials, offers & news at Statistics Globe.
I hate spam & you may opt out anytime: Privacy Policy.


Top