Specify Column Names for X & Y when Joining with dplyr Package in R (Example)

 

This article explains how to define variable names for both data frames in a dplyr join in the R programming language.

Table of contents:

Let’s just jump right in.

 

Exemplifying Data & Packages

Let’s first construct two example data frames in R:

data1 <- data.frame(ID_1 = 1:5,                                # Example data 1
                    x1 = letters[1:5],
                    x2 = 3)
data1                                                          # Print example data 1
#   ID_1 x1 x2
# 1    1  a  3
# 2    2  b  3
# 3    3  c  3
# 4    4  d  3
# 5    5  e  3
data2 <- data.frame(ID_2 = 3:7,                                # Example data 2
                    y1 = 5:1,
                    y2 = 7)
data2                                                          # Print example data 2
#   ID_2 y1 y2
# 1    3  5  7
# 2    4  4  7
# 3    5  3  7
# 4    6  2  7
# 5    7  1  7

Have a look at the previous RStudio console output. It shows that our two data frames have different column names for the ID-variables (i.e. ID_1 and ID_2).

We also have to install and load the dplyr package to RStudio, if we want to use the functions that are included in the package.

install.packages("dplyr")                                      # Install dplyr package
library("dplyr")                                               # Load dplyr

Finally, let’s merge out data

 

Example: Specify Names of Joined Columns Using dplyr Package

The following R syntax shows how to do a left join when the ID columns of both data frames are different. We simply need to specify by = c(“ID_1” = “ID_2”) within the left_join function as shown below:.

data_join <- left_join(data1, data2, by = c("ID_1" = "ID_2"))  # Setting names
data_join                                                      # Print joined data
#   ID_1 x1 x2 y1 y2
# 1    1  a  3 NA NA
# 2    2  b  3 NA NA
# 3    3  c  3  5  7
# 4    4  d  3  4  7
# 5    5  e  3  3  7

Have a look at the previous output of the RStudio console. Our two data frames were merged, even though they had different ID-names. Note that the ID-name in the joined data frame is the same as in the first input data frame.

 

Video, Further Resources & Summary

Some time ago I have published a video on my YouTube channel, which illustrates the topics of this tutorial. You can find the video below.

 

The YouTube video will be added soon.

 

In addition, you could have a look at the related tutorials on this homepage:

 

Summary: At this point of the tutorial you should have learned how to set up the column names in a merge with the dplyr package in the R programming language. Let me know in the comments section, in case you have further comments or questions. Furthermore, don’t forget to subscribe to my email newsletter to receive updates on new tutorials.

 

Subscribe to the Statistics Globe Newsletter

Get regular updates on the latest tutorials, offers & news at Statistics Globe.
I hate spam & you may opt out anytime: Privacy Policy.


Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.

Top