R Merge Multiple Data Frames in List (2 Examples) | Base R vs. tidyverse

 

On this page you’ll learn how to simultaneously merge multiple data frames in a list in the R programming language.

The tutorial will contain two examples or more precisely these exact contents:

Let’s do this!

 

Exemplifying Data

Before we can start with the merging, we need to create some example data. Let’s first create three data frames in R…

data1 <- data.frame(id = 1:6,                                  # Create first example data frame
                    x1 = c(5, 1, 4, 9, 1, 2),
                    x2 = c("A", "Y", "G", "F", "G", "Y"))
 
data2 <- data.frame(id = 4:9,                                  # Create second example data frame
                    y1 = c(3, 3, 4, 1, 2, 9),
                    y2 = c("a", "x", "a", "x", "a", "x"))
 
data3 <- data.frame(id = 5:6,                                  # Create third example data frame
                    z1 = c(3, 2),
                    z2 = c("K", "b"))

…and then let’s store these data frames in a list:

data_list <- list(data1, data2, data3)                         # Combine data frames to list

 

Example 1: Merge List of Multiple Data Frames with Base R

If we want to merge a list of data frames with Base R, we need to perform two steps.

First, we need to create our own merging function. Note that we have to specify the column based on which we want to join our data within this function (i.e. “id”):

my_merge <- function(df1, df2){                                # Create own merging function
  merge(df1, df2, by = "id")
}

Then, we need to apply the Reduce function to our own function:

Reduce(my_merge, data_list)                                    # Apply Reduce to own function

 

merge multiple data frames of list by column names

Table 1: Three Merged Data Frames of List.

 

Table 1 shows the result of the merging process. Note that the previous R code conducted an inner join. However, we could also specify a right, left, or full join within our user defined function. Learn more on joining data with different join types here (merge function) and here (dplyr functions).

You think the previous code was a bit complicated? Then you may prefer the code of the next example. So keep reading…

 

Example 2: Merge List of Multiple Data Frames with tidyverse

Example 1 relied on the basic installation of R (or RStudio). However, the tidyverse add-on package provides a very smooth and simple solution for combining multiple data frames in a list simultaneously. Let’s install and load the tidyverse packages (to be precise – we need the dplyr and the purrr packages for the following example):

install.packages("tidyverse")                                  # Install tidyverse package
library("tidyverse")                                           # Load tidyverse package

Now, we can use the reduce function of the tidyverse (note the lower case r) in order to join our multiple data sets in one line of R syntax:

data_list %>% reduce(inner_join, by = "id")                    # Apply reduce function of tidyverse

Much easier than Base R if you ask me, but that’s probably a matter of taste 🙂

 

Video & Further Resources

Please have a close look at the following video of my YouTube channel. I illustrate the contents that I have shown in this R tutorial in the video in more detail.

 

Please accept YouTube cookies to play this video. By accepting you will be accessing content from YouTube, a service provided by an external third party.

YouTube Content Consent Button Thumbnail

YouTube privacy policy

If you accept this notice, your choice will be saved and the page will refresh.

 

In addition to the video, I can recommend to read some of the other articles on the Statistics Globe internet page.

 

On this page you learned how to merge multiple data frames using base R and the tidyverse in R. However, please do not hesitate to tell me about it in the comments section, in case you have any further comments or questions.

 

Subscribe to the Statistics Globe Newsletter

Get regular updates on the latest tutorials, offers & news at Statistics Globe.
I hate spam & you may opt out anytime: Privacy Policy.


4 Comments. Leave new

  • Hi Joachim,

    I want to fully join two datasets but keep the values of both.

    data1 <- data.frame(id = 1:6,
    x1 = c(5, 1, 4, 9, 1, 2),
    x2 = c("A", "Y", "G", "F", "G", "Y"))

    data2 <- data.frame(id = 1:6,
    x1 = c(5, 1, 3, 9, 7, 2),
    x2 = c("K", "Y", "G", "T", "G", "L"))
    I want to join both data frames without losing any information. So, in case the value is identical, then keep it, but in case the values are different then store them as a list.

    Could you help me out here?

    Reply
  • Thank you for the tutorial. But, “inner_join” in tidyverse isn’t working. I run the same code which you provided.
    Error in as_mapper(.f, …) : object ‘inner_join’ not found
    I have a big data list of RNAseq data. It is a bit hard for me to type the name of each list. Could you please help me with that. I would like to use tidyverse to merge my list.

    Reply
    • Hey Iroda,

      Based on your error message it seems like you have specified the function inner_join at the wrong place. inner_join is as function and not an object, as indicated by your error message. Please verify that you have used the functions in your code properly.

      Regards,
      Joachim

      Reply

Leave a Reply

Your email address will not be published.

Fill out this field
Fill out this field
Please enter a valid email address.

Menu
Top