Join Multiple data.tables in R (6 Examples)

 

This article explains how to merge multiple data.tables in various ways in R programming. We show the different possible ways of merging data.tables with two and three data.tables.

Table of contents:

So now the part you have been waiting for – the examples.

 

Example Data & Packages

We first have to install and load the data.table package:

install.packages("data.table")                    # Install & load data.table package
library("data.table")

We also have to create some data that we can use in the example syntax later on:

DT1 <- data.table(V1 = 1:4,                       # Create data.table
                  V2 = c("a", "a", "b", "b"),
                  V3 = c("c", "d", "c", "d"))
DT1                                               # Print data.table

 

table 1 data frame join multiple data tables

 

Table 1 shows the structure of the exemplifying data: It is made of four rows and three columns.

DT2 <- data.table(V1 = 2:5,                       # Create data.table
                  V4 = 5:6,
                  V5 = c(FALSE, TRUE, TRUE, TRUE))
DT2                                               # Print data.table

 

table 2 data frame join multiple data tables

 

In Table 2 it is shown that we have created a data.table with the previous R programming syntax. Take a look at variable V1, we can see this variable as an identifier. Comparing DT1 to DT2 we see that DT1 and DT2 share information on observations with V1=2,3,4.

Finally, we generate a third data.table called DT3.

DT3 <- data.table(V1 = 4:6, # Create data.table
                  V2 = c("b", "b", "a"),
                  V6 = c(10, 11, 12))
DT3 # Print data.table

 

table 3 data frame join multiple data tables

 

As you can see from Table 3, we again have a column V1 to identify observations in the three data.tables DT1, DT2, and DT3.

To get further information on data.table: Take a look at our data.table page here for more data.table tutorials. If you want to have a more visual intuition of the different ways of combining data.tables in R, have a look at this R Studio post. Furthermore, you can find the CRAN page of the package here and the documentation here.

 

Example 1: Inner Join of Two data.tables

In Example 1, I’ll demonstrate how to make an inner join of two data.tables with merge().

DT1_2_inner <- merge(DT1, DT2, all = FALSE)       # Inner join
DT1_2_inner                                       # Print data.table

 

table 4 data frame join multiple data tables

 

Table 4 shows the output of the previous syntax – the inner join. Function merge() automatically takes the common variable V1 as the identifier variable of the two data.tables. As we set argument all = FALSE, we indicate that we only want to merge those data rows with the common values of identifier V1.

 

Example 2: Left Outer Join of Two data.tables

In this section, I’ll illustrate how to make a left outer join.

DT1_2_out_left <- merge(DT1, DT2, all.x = TRUE)   # Outer join: Left
DT1_2_out_left                                    # Print data.table

 

table 5 data frame join multiple data tables

 

After executing the previously shown syntax the data.table shown in Table 5 has been created. We made a left outer join. That is, we take DT1 and then add all additional information of these data rows (identified with common variable V1) available from DT2.

 

Example 3: Right Outer Join of Two data.tables

The following R programming syntax explains how to do a right join.

DT1_2_out_right <- merge(DT1, DT2, all.y = TRUE)  # Outer join: Right
DT1_2_out_right                                   # Print data.table

 

table 6 data frame join multiple data tables

 

The output of the previous R code is shown in Table 6. This time we took the information of DT2 and merged to it the information of DT1. All unavailable information is set to NA.

 

Example 4: Full Join of Two data.tables

In this example, I’ll explain how to do a full join of two data.tables.

DT1_2_full <- merge(DT1, DT2, all = TRUE)         # Full join
DT1_2_full                                        # Print data.table

 

table 7 data frame join multiple data tables

 

Table 7 shows the output of the previous R programming syntax. With argument all = TRUE we indicate that we want to merge all available information from the two datasets. All missing information is set to NA.

 

Example 5: Full Join of Three data.tables

The following code shows how to merge more than two data.tables at the same time. In the example, we conduct a full join, merging all information from three data.tables. Remember: It is important that we have a common variable V1, indicating how to match the information.

DT1_2_3_full <- Reduce(function (...) { merge(..., all = TRUE) },   # Full join
                       list(DT1, DT2, DT3)) 
DT1_2_3_full                                                        # Print data.table

 

table 8 data frame join multiple data tables

 

The resulting merged data.table is shown in Table 8.

 

Example 6: Inner Join of Three data.tables

As a last example, we make an inner join of three data.tables. For that, we simply set argument all = FALSE in the merge() function.

DT1_2_3_inner <- Reduce(function (...) { merge(..., all = FALSE) },   # Inner join
                        list(DT1, DT2, DT3))
DT1_2_3_inner                                                         # Print data.table

 

table 9 data frame join multiple data tables

 

You can see that the three data.tables only have V1=4 in common.

 

Video, Further Resources & Summary

Do you want to learn more about the merging of data.tables in R? Then I recommend having a look at the following video on my YouTube channel. In the video, I demonstrate the R programming codes of this tutorial:

 

The YouTube video will be added soon.

 

Furthermore, you could have a look at the related tutorials on this homepage:

 

You have learned in this article how to combine data.tables in R. If you have any additional comments and/or questions, don’t hesitate to let me know in the comments section.

 

Anna-Lena Wölwer Survey Statistician & R Programmer

This page was created in collaboration with Anna-Lena Wölwer. Have a look at Anna-Lena’s author page to get further information about her academic background and the other articles she has written for Statistics Globe.

 

2 Comments. Leave new

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.

The maximum upload file size: 2 MB. You can upload: image. Drop file here

Top