Calculate Time Difference Between Two Columns of pandas DataFrame in Python

 

In this tutorial you’ll learn how to compute the time difference between two variables of a pandas DataFrame using the Python programming language.

The structure of the tutorial is shown below:

Without further ado, let’s dive into it!

 

Add-On Libraries and Data Initialization

As a first step, we have to import the pandas library as well as relativedelta from the dateutil module:

import pandas as pd
from dateutil import relativedelta

Then, we have to create an exemplifying pandas DataFrame that we can use in the examples later on:

data = {'date_col_1': [pd.Timestamp('2020-01-24 18:05:25.780000'), 
                       pd.Timestamp('2020-03-24 18:05:25.780000'), 
                       pd.Timestamp('2021-07-24 15:03:14'), 
                       pd.Timestamp('2020-01-24 18:05:25'), 
                       pd.Timestamp('2019-01-27 15:38:22')],
 
        'date_col_2': [pd.Timestamp('2017-03-08 09:05:35.870000'), 
                       pd.Timestamp('2020-01-24 15:03:14'), 
                       pd.Timestamp('2020-02-24 15:03:14'), 
                       pd.Timestamp('2019-01-27 15:38:22'), 
                       pd.Timestamp('2020-01-24 18:05:25')]}
 
df = pd.DataFrame(data)
 
df
#                date_col_1              date_col_2
# 0 2020-01-24 18:05:25.780 2017-03-08 09:05:35.870
# 1 2020-03-24 18:05:25.780 2020-01-24 15:03:14.000
# 2 2021-07-24 15:03:14.000 2020-02-24 15:03:14.000
# 3 2020-01-24 18:05:25.000 2019-01-27 15:38:22.000
# 4 2019-01-27 15:38:22.000 2020-01-24 18:05:25.000

As you can see, our DataFrame contains two columns with dates and times.

 

Example 1.1: Using the Minus Operator to Calculate Days, Hours, Minutes & Seconds

The difference between two date objects can be calculated with the minus operator. This will do the calculation for the entirety of two columns, the result will be in days, hours, seconds and microseconds.

df['date_diffs_simple'] = df['date_col_1'] - df['date_col_2']
df[["date_col_1", "date_col_2", "date_diffs_simple"]]
#                date_col_1              date_col_2         date_diffs_simple
# 0 2020-01-24 18:05:25.780 2017-03-08 09:05:35.870 1052 days 08:59:49.910000
# 1 2020-03-24 18:05:25.780 2020-01-24 15:03:14.000   60 days 03:02:11.780000
# 2 2021-07-24 15:03:14.000 2020-02-24 15:03:14.000         516 days 00:00:00
# 3 2020-01-24 18:05:25.000 2019-01-27 15:38:22.000         362 days 02:27:03
# 4 2019-01-27 15:38:22.000 2020-01-24 18:05:25.000       -363 days +21:32:57

 

Example 1.2: Using the Minus Operator to Calculate Total Seconds

If we want to get the difference as total seconds, we can use the following code snippet:

df['date_diffs_lambda'] = df.apply(lambda row: 
                                   (row['date_col_1'] - row['date_col_2']).total_seconds()
                                   ,axis=1)
 
df[["date_col_1", "date_col_2", "date_diffs_lambda"]]
#                date_col_1              date_col_2  date_diffs_lambda
# 0 2020-01-24 18:05:25.780 2017-03-08 09:05:35.870        90925189.91
# 1 2020-03-24 18:05:25.780 2020-01-24 15:03:14.000         5194931.78
# 2 2021-07-24 15:03:14.000 2020-02-24 15:03:14.000        44582400.00
# 3 2020-01-24 18:05:25.000 2019-01-27 15:38:22.000        31285623.00
# 4 2019-01-27 15:38:22.000 2020-01-24 18:05:25.000       -31285623.00

As you can see, the final set of data has a negative difference result because the dates are entered in the wrong order. In order to prevent this, we can use a custom function. Next, I’ll explain how to do that…

 

Example 2: Using a Custom Function

The following function is also used in the tutorial “how to calculate the time difference between two datetimes”, but we will go through it here as well.

The function sets some time units based on the number of seconds and then partitions the total time difference to these manually set time units and returns it in a formatted way.

def date_difference_calculator(row):
    date1 = row['date_col_1']
    date2 = row['date_col_2']
 
    if date1 < date2:
        diff = date2 - date1
    else:
        diff = date1 - date2
 
    minute  = 60
    hour    = minute * 60
    day     = hour * 24
 
    # Calculate the amount of full weeks
    # Use remainder of days to calc hours
    # Use remainder of hours to calc minutes
    # Use remainder of minutes to calc seconds
 
    days    = divmod(diff.total_seconds(), day)
    hours   = divmod(days[1], hour)
    minutes = divmod(hours[1], minute)
    seconds = minutes[1]
 
    return str(round(days[0])) + " days " + "{:0>2}".format(round(hours[0])) + ":" + "{:0>2}".format(round(minutes[0])) + ":" + "{:0>2}".format(round(seconds))

Next, we can use the apply function to use our user-defined function for our DataFrame:

df['date_diffs_custom'] = df.apply(date_difference_calculator, axis=1)
df[["date_col_1", "date_col_2", "date_diffs_custom"]]
#                date_col_1              date_col_2   date_diffs_custom
# 0 2020-01-24 18:05:25.780 2017-03-08 09:05:35.870  1052 days 08:59:50
# 1 2020-03-24 18:05:25.780 2020-01-24 15:03:14.000    60 days 03:02:12
# 2 2021-07-24 15:03:14.000 2020-02-24 15:03:14.000   516 days 00:00:00
# 3 2020-01-24 18:05:25.000 2019-01-27 15:38:22.000   362 days 02:27:03
# 4 2019-01-27 15:38:22.000 2020-01-24 18:05:25.000   362 days 02:27:03

We can also use built-in functions from imported libraries, so let’s move on to the next example.

 

Example 3.1: Using relativedelta Library

Another way to calculate the difference between the two pandas DataFrame columns can be using a built-in library like relativedelta.

A major difference compared to the previous example is the usage of the lambda function. A lambda function is a small nameless function, which we can use to do row-based custom operations on the DataFrame.

In this specific example, we need it to use the abs() function to prevent any negative date_diff results.

df['relative_diff'] = df.apply(lambda row: 
                               abs(relativedelta.relativedelta(row['date_col_2'],
                                                               row['date_col_1'])), 
                               axis=1)
 
df[["date_col_1","date_col_2", "relative_diff"]]
#                date_col_1  ...                                      relative_diff
# 0 2020-01-24 18:05:25.780  ...  relativedelta(years=+2, months=+10, days=+16, ...
# 1 2020-03-24 18:05:25.780  ...  relativedelta(months=+2, hours=+3, minutes=+2,...
# 2 2021-07-24 15:03:14.000  ...                 relativedelta(years=+1, months=+5)
# 3 2020-01-24 18:05:25.000  ...  relativedelta(months=+11, days=+28, hours=+2, ...
# 4 2019-01-27 15:38:22.000  ...  relativedelta(months=+11, days=+28, hours=+2, ...

 

Example 3.2: Using relativedelta Library with Extended Formatting

The previous method returned the difference as a relativedelta object, if we want to format the output as a string, then we can use the method in the following example:

df['relative_diff_str'] = df.apply(lambda row: 
                                str(row['relative_diff'].years) + " years " + 
                                str(row['relative_diff'].months) + " months " +
                                str(row['relative_diff'].days) + " days " +
                                str(row['relative_diff'].hours) + " hours " +
                                str(row['relative_diff'].minutes) + " minutes " +
                                str(row['relative_diff'].seconds) + " seconds", axis = 1)
 
df[["date_col_1","date_col_2", "relative_diff_str"]]
#                date_col_1  ...                                  relative_diff_str
# 0 2020-01-24 18:05:25.780  ...  2 years 10 months 16 days 8 hours 59 minutes 5...
# 1 2020-03-24 18:05:25.780  ...  0 years 2 months 0 days 3 hours 2 minutes 12 s...
# 2 2021-07-24 15:03:14.000  ...  1 years 5 months 0 days 0 hours 0 minutes 0 se...
# 3 2020-01-24 18:05:25.000  ...  0 years 11 months 28 days 2 hours 27 minutes 3...
# 4 2019-01-27 15:38:22.000  ...  0 years 11 months 28 days 2 hours 27 minutes 3...

 

Example 3.3: Using relativedelta Library with Compact Formatting

If you think the previous leads to too long outputs, then you might use the following method to generate a more compact output.

The format method is used to fill the output with characters of the user’s choice to set the number of characters.

In this case, if the month difference is 5, then it will be altered to be 05 and so on.

df['relative_diff_str_2'] = df.apply(lambda row:
                                "{:0>4}".format(row['relative_diff'].years) + "-" + 
                                "{:0>2}".format(row['relative_diff'].months) + "-" +
                                "{:0>2}".format(row['relative_diff'].days) + " " +
                                "{:0>2}".format(row['relative_diff'].hours) + ":" +
                                "{:0>2}".format(row['relative_diff'].minutes) + ":" +
                                "{:0>2}".format(row['relative_diff'].seconds), axis = 1)
 
df[["date_col_1","date_col_2", "relative_diff_str_2"]]
#                date_col_1              date_col_2  relative_diff_str_2
# 0 2020-01-24 18:05:25.780 2017-03-08 09:05:35.870  0002-10-16 08:59:50
# 1 2020-03-24 18:05:25.780 2020-01-24 15:03:14.000  0000-02-00 03:02:12
# 2 2021-07-24 15:03:14.000 2020-02-24 15:03:14.000  0001-05-00 00:00:00
# 3 2020-01-24 18:05:25.000 2019-01-27 15:38:22.000  0000-11-28 02:27:03
# 4 2019-01-27 15:38:22.000 2020-01-24 18:05:25.000  0000-11-28 02:27:03

 

Video, Further Resources & Summary

Do you need more explanations on how to calculate the time difference in Python? Then you should have a look at the following YouTube video of the Statistics Globe YouTube channel.

 

The YouTube video will be added soon.

 

Furthermore, you could have a look at some of the other tutorials on Statistics Globe:

This post has shown how to get the time difference between two columns of a pandas DataFrame. In case you have further questions, you may leave a comment below.

 

Ömer Ekiz Informatics Expert

This page was created in collaboration with Ömer Ekiz. You may have a look at Ömer’s author page to read more about his academic background and the other articles he has written for Statistics Globe.

 

Subscribe to the Statistics Globe Newsletter

Get regular updates on the latest tutorials, offers & news at Statistics Globe.
I hate spam & you may opt out anytime: Privacy Policy.


Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.

Top