Calculate Time Difference Between Two Columns of pandas DataFrame in Python
In this tutorial you’ll learn how to compute the time difference between two variables of a pandas DataFrame using the Python programming language.
The structure of the tutorial is shown below:
Without further ado, let’s dive into it!
Add-On Libraries and Data Initialization
As a first step, we have to import the pandas library as well as relativedelta from the dateutil module:
import pandas as pd from dateutil import relativedelta |
import pandas as pd from dateutil import relativedelta
Then, we have to create an exemplifying pandas DataFrame that we can use in the examples later on:
data = {'date_col_1': [pd.Timestamp('2020-01-24 18:05:25.780000'), pd.Timestamp('2020-03-24 18:05:25.780000'), pd.Timestamp('2021-07-24 15:03:14'), pd.Timestamp('2020-01-24 18:05:25'), pd.Timestamp('2019-01-27 15:38:22')], 'date_col_2': [pd.Timestamp('2017-03-08 09:05:35.870000'), pd.Timestamp('2020-01-24 15:03:14'), pd.Timestamp('2020-02-24 15:03:14'), pd.Timestamp('2019-01-27 15:38:22'), pd.Timestamp('2020-01-24 18:05:25')]} df = pd.DataFrame(data) df # date_col_1 date_col_2 # 0 2020-01-24 18:05:25.780 2017-03-08 09:05:35.870 # 1 2020-03-24 18:05:25.780 2020-01-24 15:03:14.000 # 2 2021-07-24 15:03:14.000 2020-02-24 15:03:14.000 # 3 2020-01-24 18:05:25.000 2019-01-27 15:38:22.000 # 4 2019-01-27 15:38:22.000 2020-01-24 18:05:25.000 |
data = {'date_col_1': [pd.Timestamp('2020-01-24 18:05:25.780000'), pd.Timestamp('2020-03-24 18:05:25.780000'), pd.Timestamp('2021-07-24 15:03:14'), pd.Timestamp('2020-01-24 18:05:25'), pd.Timestamp('2019-01-27 15:38:22')], 'date_col_2': [pd.Timestamp('2017-03-08 09:05:35.870000'), pd.Timestamp('2020-01-24 15:03:14'), pd.Timestamp('2020-02-24 15:03:14'), pd.Timestamp('2019-01-27 15:38:22'), pd.Timestamp('2020-01-24 18:05:25')]} df = pd.DataFrame(data) df # date_col_1 date_col_2 # 0 2020-01-24 18:05:25.780 2017-03-08 09:05:35.870 # 1 2020-03-24 18:05:25.780 2020-01-24 15:03:14.000 # 2 2021-07-24 15:03:14.000 2020-02-24 15:03:14.000 # 3 2020-01-24 18:05:25.000 2019-01-27 15:38:22.000 # 4 2019-01-27 15:38:22.000 2020-01-24 18:05:25.000
As you can see, our DataFrame contains two columns with dates and times.
Example 1.1: Using the Minus Operator to Calculate Days, Hours, Minutes & Seconds
The difference between two date objects can be calculated with the minus operator. This will do the calculation for the entirety of two columns, the result will be in days, hours, seconds and microseconds.
df['date_diffs_simple'] = df['date_col_1'] - df['date_col_2'] df[["date_col_1", "date_col_2", "date_diffs_simple"]] # date_col_1 date_col_2 date_diffs_simple # 0 2020-01-24 18:05:25.780 2017-03-08 09:05:35.870 1052 days 08:59:49.910000 # 1 2020-03-24 18:05:25.780 2020-01-24 15:03:14.000 60 days 03:02:11.780000 # 2 2021-07-24 15:03:14.000 2020-02-24 15:03:14.000 516 days 00:00:00 # 3 2020-01-24 18:05:25.000 2019-01-27 15:38:22.000 362 days 02:27:03 # 4 2019-01-27 15:38:22.000 2020-01-24 18:05:25.000 -363 days +21:32:57 |
df['date_diffs_simple'] = df['date_col_1'] - df['date_col_2'] df[["date_col_1", "date_col_2", "date_diffs_simple"]] # date_col_1 date_col_2 date_diffs_simple # 0 2020-01-24 18:05:25.780 2017-03-08 09:05:35.870 1052 days 08:59:49.910000 # 1 2020-03-24 18:05:25.780 2020-01-24 15:03:14.000 60 days 03:02:11.780000 # 2 2021-07-24 15:03:14.000 2020-02-24 15:03:14.000 516 days 00:00:00 # 3 2020-01-24 18:05:25.000 2019-01-27 15:38:22.000 362 days 02:27:03 # 4 2019-01-27 15:38:22.000 2020-01-24 18:05:25.000 -363 days +21:32:57
Example 1.2: Using the Minus Operator to Calculate Total Seconds
If we want to get the difference as total seconds, we can use the following code snippet:
df['date_diffs_lambda'] = df.apply(lambda row: (row['date_col_1'] - row['date_col_2']).total_seconds() ,axis=1) df[["date_col_1", "date_col_2", "date_diffs_lambda"]] # date_col_1 date_col_2 date_diffs_lambda # 0 2020-01-24 18:05:25.780 2017-03-08 09:05:35.870 90925189.91 # 1 2020-03-24 18:05:25.780 2020-01-24 15:03:14.000 5194931.78 # 2 2021-07-24 15:03:14.000 2020-02-24 15:03:14.000 44582400.00 # 3 2020-01-24 18:05:25.000 2019-01-27 15:38:22.000 31285623.00 # 4 2019-01-27 15:38:22.000 2020-01-24 18:05:25.000 -31285623.00 |
df['date_diffs_lambda'] = df.apply(lambda row: (row['date_col_1'] - row['date_col_2']).total_seconds() ,axis=1) df[["date_col_1", "date_col_2", "date_diffs_lambda"]] # date_col_1 date_col_2 date_diffs_lambda # 0 2020-01-24 18:05:25.780 2017-03-08 09:05:35.870 90925189.91 # 1 2020-03-24 18:05:25.780 2020-01-24 15:03:14.000 5194931.78 # 2 2021-07-24 15:03:14.000 2020-02-24 15:03:14.000 44582400.00 # 3 2020-01-24 18:05:25.000 2019-01-27 15:38:22.000 31285623.00 # 4 2019-01-27 15:38:22.000 2020-01-24 18:05:25.000 -31285623.00
As you can see, the final set of data has a negative difference result because the dates are entered in the wrong order. In order to prevent this, we can use a custom function. Next, I’ll explain how to do that…
Example 2: Using a Custom Function
The following function is also used in the tutorial “how to calculate the time difference between two datetimes”, but we will go through it here as well.
The function sets some time units based on the number of seconds and then partitions the total time difference to these manually set time units and returns it in a formatted way.
def date_difference_calculator(row): date1 = row['date_col_1'] date2 = row['date_col_2'] if date1 < date2: diff = date2 - date1 else: diff = date1 - date2 minute = 60 hour = minute * 60 day = hour * 24 # Calculate the amount of full weeks # Use remainder of days to calc hours # Use remainder of hours to calc minutes # Use remainder of minutes to calc seconds days = divmod(diff.total_seconds(), day) hours = divmod(days[1], hour) minutes = divmod(hours[1], minute) seconds = minutes[1] return str(round(days[0])) + " days " + "{:0>2}".format(round(hours[0])) + ":" + "{:0>2}".format(round(minutes[0])) + ":" + "{:0>2}".format(round(seconds)) |
def date_difference_calculator(row): date1 = row['date_col_1'] date2 = row['date_col_2'] if date1 < date2: diff = date2 - date1 else: diff = date1 - date2 minute = 60 hour = minute * 60 day = hour * 24 # Calculate the amount of full weeks # Use remainder of days to calc hours # Use remainder of hours to calc minutes # Use remainder of minutes to calc seconds days = divmod(diff.total_seconds(), day) hours = divmod(days[1], hour) minutes = divmod(hours[1], minute) seconds = minutes[1] return str(round(days[0])) + " days " + "{:0>2}".format(round(hours[0])) + ":" + "{:0>2}".format(round(minutes[0])) + ":" + "{:0>2}".format(round(seconds))
Next, we can use the apply function to use our user-defined function for our DataFrame:
df['date_diffs_custom'] = df.apply(date_difference_calculator, axis=1) df[["date_col_1", "date_col_2", "date_diffs_custom"]] # date_col_1 date_col_2 date_diffs_custom # 0 2020-01-24 18:05:25.780 2017-03-08 09:05:35.870 1052 days 08:59:50 # 1 2020-03-24 18:05:25.780 2020-01-24 15:03:14.000 60 days 03:02:12 # 2 2021-07-24 15:03:14.000 2020-02-24 15:03:14.000 516 days 00:00:00 # 3 2020-01-24 18:05:25.000 2019-01-27 15:38:22.000 362 days 02:27:03 # 4 2019-01-27 15:38:22.000 2020-01-24 18:05:25.000 362 days 02:27:03 |
df['date_diffs_custom'] = df.apply(date_difference_calculator, axis=1) df[["date_col_1", "date_col_2", "date_diffs_custom"]] # date_col_1 date_col_2 date_diffs_custom # 0 2020-01-24 18:05:25.780 2017-03-08 09:05:35.870 1052 days 08:59:50 # 1 2020-03-24 18:05:25.780 2020-01-24 15:03:14.000 60 days 03:02:12 # 2 2021-07-24 15:03:14.000 2020-02-24 15:03:14.000 516 days 00:00:00 # 3 2020-01-24 18:05:25.000 2019-01-27 15:38:22.000 362 days 02:27:03 # 4 2019-01-27 15:38:22.000 2020-01-24 18:05:25.000 362 days 02:27:03
We can also use built-in functions from imported libraries, so let’s move on to the next example.
Example 3.1: Using relativedelta Library
Another way to calculate the difference between the two pandas DataFrame columns can be using a built-in library like relativedelta.
A major difference compared to the previous example is the usage of the lambda function. A lambda function is a small nameless function, which we can use to do row-based custom operations on the DataFrame.
In this specific example, we need it to use the abs() function to prevent any negative date_diff results.
df['relative_diff'] = df.apply(lambda row: abs(relativedelta.relativedelta(row['date_col_2'], row['date_col_1'])), axis=1) df[["date_col_1","date_col_2", "relative_diff"]] # date_col_1 ... relative_diff # 0 2020-01-24 18:05:25.780 ... relativedelta(years=+2, months=+10, days=+16, ... # 1 2020-03-24 18:05:25.780 ... relativedelta(months=+2, hours=+3, minutes=+2,... # 2 2021-07-24 15:03:14.000 ... relativedelta(years=+1, months=+5) # 3 2020-01-24 18:05:25.000 ... relativedelta(months=+11, days=+28, hours=+2, ... # 4 2019-01-27 15:38:22.000 ... relativedelta(months=+11, days=+28, hours=+2, ... |
df['relative_diff'] = df.apply(lambda row: abs(relativedelta.relativedelta(row['date_col_2'], row['date_col_1'])), axis=1) df[["date_col_1","date_col_2", "relative_diff"]] # date_col_1 ... relative_diff # 0 2020-01-24 18:05:25.780 ... relativedelta(years=+2, months=+10, days=+16, ... # 1 2020-03-24 18:05:25.780 ... relativedelta(months=+2, hours=+3, minutes=+2,... # 2 2021-07-24 15:03:14.000 ... relativedelta(years=+1, months=+5) # 3 2020-01-24 18:05:25.000 ... relativedelta(months=+11, days=+28, hours=+2, ... # 4 2019-01-27 15:38:22.000 ... relativedelta(months=+11, days=+28, hours=+2, ...
Example 3.2: Using relativedelta Library with Extended Formatting
The previous method returned the difference as a relativedelta object, if we want to format the output as a string, then we can use the method in the following example:
df['relative_diff_str'] = df.apply(lambda row: str(row['relative_diff'].years) + " years " + str(row['relative_diff'].months) + " months " + str(row['relative_diff'].days) + " days " + str(row['relative_diff'].hours) + " hours " + str(row['relative_diff'].minutes) + " minutes " + str(row['relative_diff'].seconds) + " seconds", axis = 1) df[["date_col_1","date_col_2", "relative_diff_str"]] # date_col_1 ... relative_diff_str # 0 2020-01-24 18:05:25.780 ... 2 years 10 months 16 days 8 hours 59 minutes 5... # 1 2020-03-24 18:05:25.780 ... 0 years 2 months 0 days 3 hours 2 minutes 12 s... # 2 2021-07-24 15:03:14.000 ... 1 years 5 months 0 days 0 hours 0 minutes 0 se... # 3 2020-01-24 18:05:25.000 ... 0 years 11 months 28 days 2 hours 27 minutes 3... # 4 2019-01-27 15:38:22.000 ... 0 years 11 months 28 days 2 hours 27 minutes 3... |
df['relative_diff_str'] = df.apply(lambda row: str(row['relative_diff'].years) + " years " + str(row['relative_diff'].months) + " months " + str(row['relative_diff'].days) + " days " + str(row['relative_diff'].hours) + " hours " + str(row['relative_diff'].minutes) + " minutes " + str(row['relative_diff'].seconds) + " seconds", axis = 1) df[["date_col_1","date_col_2", "relative_diff_str"]] # date_col_1 ... relative_diff_str # 0 2020-01-24 18:05:25.780 ... 2 years 10 months 16 days 8 hours 59 minutes 5... # 1 2020-03-24 18:05:25.780 ... 0 years 2 months 0 days 3 hours 2 minutes 12 s... # 2 2021-07-24 15:03:14.000 ... 1 years 5 months 0 days 0 hours 0 minutes 0 se... # 3 2020-01-24 18:05:25.000 ... 0 years 11 months 28 days 2 hours 27 minutes 3... # 4 2019-01-27 15:38:22.000 ... 0 years 11 months 28 days 2 hours 27 minutes 3...
Example 3.3: Using relativedelta Library with Compact Formatting
If you think the previous leads to too long outputs, then you might use the following method to generate a more compact output.
The format method is used to fill the output with characters of the user’s choice to set the number of characters.
In this case, if the month difference is 5, then it will be altered to be 05 and so on.
df['relative_diff_str_2'] = df.apply(lambda row: "{:0>4}".format(row['relative_diff'].years) + "-" + "{:0>2}".format(row['relative_diff'].months) + "-" + "{:0>2}".format(row['relative_diff'].days) + " " + "{:0>2}".format(row['relative_diff'].hours) + ":" + "{:0>2}".format(row['relative_diff'].minutes) + ":" + "{:0>2}".format(row['relative_diff'].seconds), axis = 1) df[["date_col_1","date_col_2", "relative_diff_str_2"]] # date_col_1 date_col_2 relative_diff_str_2 # 0 2020-01-24 18:05:25.780 2017-03-08 09:05:35.870 0002-10-16 08:59:50 # 1 2020-03-24 18:05:25.780 2020-01-24 15:03:14.000 0000-02-00 03:02:12 # 2 2021-07-24 15:03:14.000 2020-02-24 15:03:14.000 0001-05-00 00:00:00 # 3 2020-01-24 18:05:25.000 2019-01-27 15:38:22.000 0000-11-28 02:27:03 # 4 2019-01-27 15:38:22.000 2020-01-24 18:05:25.000 0000-11-28 02:27:03 |
df['relative_diff_str_2'] = df.apply(lambda row: "{:0>4}".format(row['relative_diff'].years) + "-" + "{:0>2}".format(row['relative_diff'].months) + "-" + "{:0>2}".format(row['relative_diff'].days) + " " + "{:0>2}".format(row['relative_diff'].hours) + ":" + "{:0>2}".format(row['relative_diff'].minutes) + ":" + "{:0>2}".format(row['relative_diff'].seconds), axis = 1) df[["date_col_1","date_col_2", "relative_diff_str_2"]] # date_col_1 date_col_2 relative_diff_str_2 # 0 2020-01-24 18:05:25.780 2017-03-08 09:05:35.870 0002-10-16 08:59:50 # 1 2020-03-24 18:05:25.780 2020-01-24 15:03:14.000 0000-02-00 03:02:12 # 2 2021-07-24 15:03:14.000 2020-02-24 15:03:14.000 0001-05-00 00:00:00 # 3 2020-01-24 18:05:25.000 2019-01-27 15:38:22.000 0000-11-28 02:27:03 # 4 2019-01-27 15:38:22.000 2020-01-24 18:05:25.000 0000-11-28 02:27:03
Video, Further Resources & Summary
Do you need more explanations on how to calculate the time difference in Python? Then you should have a look at the following YouTube video of the Statistics Globe YouTube channel.
The YouTube video will be added soon.
Furthermore, you could have a look at some of the other tutorials on Statistics Globe:
- Dates & Times in Python – datetime Module
- Introduction to timedelta Objects in Python
- Calculate Number of Hours, Minutes & Seconds Between Two datetimes in Python
- Calculate Number of Years, Months & Days Between Two Dates in Python
- Calculate Time Difference Between Two datetime Objects in Python
- Calculate Time Difference in Milliseconds Between Two datetimes
- How to Add & Subtract Weeks to & from Date in Python
- Add Days, Months & Years to datetime Object
This post has shown how to get the time difference between two columns of a pandas DataFrame. In case you have further questions, you may leave a comment below.
This page was created in collaboration with Ömer Ekiz. You may have a look at Ömer’s author page to read more about his academic background and the other articles he has written for Statistics Globe.