Specify Column Names for X & Y when Joining with dplyr Package in R (Example)
This article explains how to define variable names for both data frames in a dplyr join in the R programming language.
Table of contents:
Let’s just jump right in.
Exemplifying Data & Packages
Let’s first construct two example data frames in R:
data1 <- data.frame(ID_1 = 1:5, # Example data 1 x1 = letters[1:5], x2 = 3) data1 # Print example data 1 # ID_1 x1 x2 # 1 1 a 3 # 2 2 b 3 # 3 3 c 3 # 4 4 d 3 # 5 5 e 3 data2 <- data.frame(ID_2 = 3:7, # Example data 2 y1 = 5:1, y2 = 7) data2 # Print example data 2 # ID_2 y1 y2 # 1 3 5 7 # 2 4 4 7 # 3 5 3 7 # 4 6 2 7 # 5 7 1 7
Have a look at the previous RStudio console output. It shows that our two data frames have different column names for the ID-variables (i.e. ID_1 and ID_2).
We also have to install and load the dplyr package to RStudio, if we want to use the functions that are included in the package.
install.packages("dplyr") # Install dplyr package library("dplyr") # Load dplyr
Finally, let’s merge out data…
Example: Specify Names of Joined Columns Using dplyr Package
The following R syntax shows how to do a left join when the ID columns of both data frames are different. We simply need to specify by = c(“ID_1” = “ID_2”) within the left_join function as shown below:.
data_join <- left_join(data1, data2, by = c("ID_1" = "ID_2")) # Setting names data_join # Print joined data # ID_1 x1 x2 y1 y2 # 1 1 a 3 NA NA # 2 2 b 3 NA NA # 3 3 c 3 5 7 # 4 4 d 3 4 7 # 5 5 e 3 3 7
Have a look at the previous output of the RStudio console. Our two data frames were merged, even though they had different ID-names. Note that the ID-name in the joined data frame is the same as in the first input data frame.
Video, Further Resources & Summary
Some time ago I have published a video on my YouTube channel, which illustrates the topics of this tutorial. You can find the video below.
The YouTube video will be added soon.
In addition, you could have a look at the related tutorials on this homepage:
Summary: At this point of the tutorial you should have learned how to set up the column names in a merge with the dplyr package in the R programming language. Let me know in the comments section, in case you have further comments or questions. Furthermore, don’t forget to subscribe to my email newsletter to receive updates on new tutorials.
Statistics Globe Newsletter