Merge Multiple pandas DataFrames in Python (2 Examples)

 

In this Python tutorial you’ll learn how to join three or more pandas DataFrames.

Table of contents:

Let’s get started:

 

Example Data & Software Libraries

We first need to load the pandas library, to be able to use the corresponding functions:

import pandas as pd                          # Load pandas library

Let’s also create several example DataFrames in Python:

data1 = pd.DataFrame({"ID":range(10, 16),    # Create first pandas DataFrame
                      "x1":range(100, 106),
                      "x2":["a", "b", "c", "d", "e", "f"],
                      "x3":range(27, 21, - 1)})
print(data1)                                 # Print first pandas DataFrame

 

table 1 DataFrame merge multiple pandas dataframes python

 

data2 = pd.DataFrame({"ID":range(14, 19),    # Create second pandas DataFrame
                      "y1":["x", "y", "x", "x", "y"],
                      "y2":range(20, 25),
                      "y3":range(10, 1, - 2)})
print(data2)                                 # Print second pandas DataFrame

 

table 2 DataFrame merge multiple pandas dataframes python

 

data3 = pd.DataFrame({"ID":range(12, 20),    # Create third pandas DataFrame
                      "z1":range(111, 119),
                      "z2":range(10, 2, - 1)})
print(data3)                                 # Print third pandas DataFrame

 

table 3 DataFrame merge multiple pandas dataframes python

 

As shown in Tables 1, 2, and 3, the previous code has created three different pandas DataFrames. All of these DataFrames contain an ID column that we will use to combine the DataFrames in the following examples.

Before we can jump into the merging process, we also have to import the reduce function from the functools module:

from functools import reduce                 # Import reduce function

Now, we are set up and can move on to the examples!

 

Example 1: Merge Multiple pandas DataFrames Using Inner Join

The following Python programming code illustrates how to perform an inner join to combine three different data sets in Python.

For this, we can apply the Python syntax below:

data_merge1 = reduce(lambda left, right:     # Merge three pandas DataFrames
                     pd.merge(left , right,
                              on = ["ID"]),
                     [data1, data2, data3])
print(data_merge1)                           # Print merged DataFrame

 

table 4 DataFrame merge multiple pandas dataframes python

 

The output of the previous Python syntax is visualized in Table 4. We have horizontally concatenated our three input DataFrames.

As you can see, we have removed several rows from our data, since we have performed an inner join.

In the next example, I’ll explain how to keep as much data as possible.

 

Example 2: Merge Multiple pandas DataFrames Using Outer Join

In Example 2, I’ll show how to combine multiple pandas DataFrames using an outer join (also called full join).

To do this, we have to set the how argument within the merge function to be equal to “outer”:

data_merge2 = reduce(lambda left, right:     # Merge three pandas DataFrames
                     pd.merge(left , right,
                              on = ["ID"],
                              how = "outer"),
                     [data1, data2, data3])
print(data_merge2)                           # Print merged DataFrame

 

table 5 DataFrame merge multiple pandas dataframes python

 

After executing the previous Python syntax the horizontally appended pandas DataFrame shown in Table 5 has been created.

This time, we have kept all IDs and rows of our input data sets. For that reason, some of the values in our DataFrame union are NaN.

 

Video & Further Resources

Do you need further information on the Python programming code of this tutorial? Then you may want to watch the following video on my YouTube channel. In the video, I’m explaining the content of this article:

 

 

In addition, you might read some of the related tutorials on my website. I have released several tutorials already:

 

In summary: In this article you have learned how to add multiple pandas DataFrames together in the Python programming language. If you have any additional questions, let me know in the comments below. In addition, please subscribe to my email newsletter to receive updates on the newest tutorials.

 

Subscribe to the Statistics Globe Newsletter

Get regular updates on the latest tutorials, offers & news at Statistics Globe.
I hate spam & you may opt out anytime: Privacy Policy.


Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.

Top