Merge Data Frames by Two ID Columns in R (2 Examples)

 

In this article you’ll learn how to combine multiple data frames based on more than one ID column in R.

The article looks as follows:

Let’s take a look at some R codes in action!

 

Creation of Example Data

First, I’ll have to create some data that we can use in the following examples:

data1 <- data.frame(ID1 = 1:5,                                 # Create first data frame
                    ID2 = letters[1:5],
                    x1 = c(4, 1, 6, 7, 8),
                    x2 = 9)
data1                                                          # Print first data frame
#   ID1 ID2 x1 x2
# 1   1   a  4  9
# 2   2   b  1  9
# 3   3   c  6  9
# 4   4   d  7  9
# 5   5   e  8  9

As you can see based on the previously shown output of the RStudio console, our first example data frame consists of five rows and four columns. The variables ID1 and ID2 will be used for the combination of our data frames.

Let’s create a second example data frame:

data2 <- data.frame(ID1 = 3:7,                                 # Create second data frame
                    ID2 = letters[3:7],
                    y1 = c(4, 4, 5, 1, 1),
                    y2 = 5)
data2                                                          # Print second data frame
#   ID1 ID2 y1 y2
# 1   3   c  4  5
# 2   4   d  4  5
# 3   5   e  5  5
# 4   6   f  1  5
# 5   7   g  1  5

The second data frame also contains five rows and four columns, including the two ID columns ID1 and ID2.

 

Example 1: Combine Data by Two ID Columns Using merge() Function

In Example 1, I’ll illustrate how to apply the merge function to combine data frames based on multiple ID columns. Fir this, we have to specify the by argument of the merge function to be equal to a vector of ID column names (i.e. by = c(“ID1”, “ID2”)).

data_merge1 <- merge(data1, data2, by = c("ID1", "ID2"))       # Applying merge() function
data_merge1                                                    # Print merged data
#   ID1 ID2 x1 x2 y1 y2
# 1   3   c  6  9  4  5
# 2   4   d  7  9  4  5
# 3   5   e  8  9  5  5

Have a look at the previous output of the RStudio console. We have created a merged data frame based on two ID columns.

 

Example 2: Combine Data by Two ID Columns Using inner_join() Function of dplyr Package

This Example illustrates how to use the dplyr package to merge data by two ID columns.

First, we need to install and load the dplyr package:

install.packages("dplyr")                                      # Install dplyr package
library("dplyr")                                               # Load dplyr package

Now, we can apply the inner_join function to create exactly the same output as in Example 1:

data_merge2 <- inner_join(data1, data2, by = c("ID1", "ID2"))  # Applying inner_join() function
data_merge2                                                    # Print merged data
#   ID1 ID2 x1 x2 y1 y2
# 1   3   c  6  9  4  5
# 2   4   d  7  9  4  5
# 3   5   e  8  9  5  5

Note that the previous examples performed an inner join. However, it is also possible to apply other types of data joins such as left joins, right joins, outer joins, and so on.

 

Video, Further Resources & Summary

Do you want to learn more about merging data? Then I can recommend to have a look at the following video of my YouTube channel. I explain the content of this tutorial in the video.

 

The YouTube video will be added soon.

 

Besides the video, you may want to read the related tutorials of this homepage:

 

You learned in this tutorial how to join several data frames based on two ID variables in the R programming language. In case you have further comments or questions, let me know in the comments.

 

Subscribe to my free statistics newsletter

Get regular updates on the latest tutorials, offers & news at Statistics Globe.
I hate spam & you may opt out anytime: Privacy Policy.


Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.

Menu
Top