R Merge Multiple Data Frames in List (2 Examples) | Base R vs. tidyverse

 

On this page you’ll learn how to simultaneously merge multiple data frames in a list in the R programming language.

The tutorial will contain two examples or more precisely these exact contents:

Let’s do this!

 

Exemplifying Data

Before we can start with the merging, we need to create some example data. Let’s first create three data frames in R…

data1 <- data.frame(id = 1:6,                                  # Create first example data frame
                    x1 = c(5, 1, 4, 9, 1, 2),
                    x2 = c("A", "Y", "G", "F", "G", "Y"))
 
data2 <- data.frame(id = 4:9,                                  # Create second example data frame
                    y1 = c(3, 3, 4, 1, 2, 9),
                    y2 = c("a", "x", "a", "x", "a", "x"))
 
data3 <- data.frame(id = 5:6,                                  # Create third example data frame
                    z1 = c(3, 2),
                    z2 = c("K", "b"))

…and then let’s store these data frames in a list:

data_list <- list(data1, data2, data3)                         # Combine data frames to list

 

Example 1: Merge List of Multiple Data Frames with Base R

If we want to merge a list of data frames with Base R, we need to perform two steps.

First, we need to create our own merging function. Note that we have to specify the column based on which we want to join our data within this function (i.e. “id”):

my_merge <- function(df1, df2){                                # Create own merging function
  merge(df1, df2, by = "id")
}

Then, we need to apply the Reduce function to our own function:

Reduce(my_merge, data_list)                                    # Apply Reduce to own function

 

merge multiple data frames of list by column names

Table 1: Three Merged Data Frames of List.

 

Table 1 shows the result of the merging process. Note that the previous R code conducted an inner join. However, we could also specify a right, left, or full join within our user defined function. Learn more on joining data with different join types here (merge function) and here (dplyr functions).

You think the previous code was a bit complicated? Then you may prefer the code of the next example. So keep reading…

 

Example 2: Merge List of Multiple Data Frames with tidyverse

Example 1 relied on the basic installation of R (or RStudio). However, the tidyverse add-on package provides a very smooth and simple solution for combining multiple data frames in a list simultaneously. Let’s install and load the tidyverse packages (to be precise – we need the dplyr and the purrr packages for the following example):

install.packages("tidyverse")                                  # Install tidyverse package
library("tidyverse")                                           # Load tidyverse package

Now, we can use the reduce function of the tidyverse (note the lower case r) in order to join our multiple data sets in one line of R syntax:

data_list %>% reduce(inner_join, by = "id")                    # Apply reduce function of tidyverse

Much easier than Base R if you ask me, but that’s probably a matter of taste 🙂

 

Video & Further Resources

Please have a close look at the following video of my YouTube channel. I illustrate the contents that I have shown in this R tutorial in the video in more detail.

 

 

In addition to the video, I can recommend to read some of the other articles on the Statistics Globe internet page.

 

On this page you learned how to merge multiple data frames using base R and the tidyverse in R. However, please do not hesitate to tell me about it in the comments section, in case you have any further comments or questions.

 

Subscribe to the Statistics Globe Newsletter

Get regular updates on the latest tutorials, offers & news at Statistics Globe.
I hate spam & you may opt out anytime: Privacy Policy.


10 Comments. Leave new

  • Hi Joachim,

    I want to fully join two datasets but keep the values of both.

    data1 <- data.frame(id = 1:6,
    x1 = c(5, 1, 4, 9, 1, 2),
    x2 = c("A", "Y", "G", "F", "G", "Y"))

    data2 <- data.frame(id = 1:6,
    x1 = c(5, 1, 3, 9, 7, 2),
    x2 = c("K", "Y", "G", "T", "G", "L"))
    I want to join both data frames without losing any information. So, in case the value is identical, then keep it, but in case the values are different then store them as a list.

    Could you help me out here?

    Reply
  • Thank you for the tutorial. But, “inner_join” in tidyverse isn’t working. I run the same code which you provided.
    Error in as_mapper(.f, …) : object ‘inner_join’ not found
    I have a big data list of RNAseq data. It is a bit hard for me to type the name of each list. Could you please help me with that. I would like to use tidyverse to merge my list.

    Reply
    • Hey Iroda,

      Based on your error message it seems like you have specified the function inner_join at the wrong place. inner_join is as function and not an object, as indicated by your error message. Please verify that you have used the functions in your code properly.

      Regards,
      Joachim

      Reply
  • hello,
    Thank you for this tutorial, I have an RDS file which contains 14 data frames , I read them separately and i need to merge them to be read as MAF file … ii tried the above functions on them and it keeps giving me this error
    {Error in (function (classes, fdef, mtable) :
    unable to find an inherited method for function ‘reduce’ for signature ‘”list”’}

    would you help me with that please,
    Thank you in advance

    Reply
    • Hey Esraa,

      Please excuse the delayed response. I was on a long vacation, so unfortunately I wasn’t able to get back to you earlier. Do you still need help with your code?

      Regards,
      Joachim

      Reply
  • Thank you Joachim, welcome back… i figured it out but thank you though

    Best,
    Esraa

    Reply
  • hello joachim your videos have been helpful, but i have a current issue, i was asked to
    Read in the tab-delim file country_population.txt and call it population. This file describes the
    population for different countries. Rename the second column to “Country” and the column about the world population to percent. Hint: use read_tsv().

    i tried the bellow steps and still getting an error message on my RStudio

    library(tidyverse)
    “”
    Population = read.tsv(“country_population”, Sep = “\t”, header=TRUE)
    Population Population <- read.csv("country_population", header=TRUE)
    Warning: cannot open file 'country_population': No such file or directory Error in file(file, "rt") : cannot open the connection

    Reply
    • Hello Patty,

      Thank you for your kind words. Apparently, your file path is not correct. Is the data uploaded to your PC? If so, check this link to see how to import csv and tsv files in R. Your file path should look in the link. For further info on file paths, see this link. If you still have trouble, please let me know.

      Regards,
      Cansu

      Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.

Top