Import & Merge Multiple csv Files in R (Example)

 

In this article, I’ll show you how to import and merge csv files in the R programming language.

The page will contain the following topics:

Let’s do this!

 

Exemplifying Data

Before we can start with the example, we need to create an exemplifying directory including multiple csv files. First, we need to create several data frames

data1 <- data.frame(id = 1:6,                                     # Create first example data frame
                    x1 = c(5, 1, 4, 9, 1, 2),
                    x2 = c("A", "Y", "G", "F", "G", "Y"))
 
data2 <- data.frame(id = 4:9,                                     # Create second example data frame
                    y1 = c(3, 3, 4, 1, 2, 9),
                    y2 = c("a", "x", "a", "x", "a", "x"))
 
data3 <- data.frame(id = 5:6,                                     # Create third example data frame
                    z1 = c(3, 2),
                    z2 = c("K", "b"))

…and then we need to export these data frames as csv files to our computer:

write.csv(data1, "C:/...Your Path.../my_folder/data1.csv")        # Write first example data frame
write.csv(data2, "C:/...Your Path.../my_folder/data2.csv")        # Write second example data frame
write.csv(data3, "C:/...Your Path.../my_folder/data3.csv")        # Write third example data frame

 

example directory with csv files

Figure 1: Exemplifying Directory with csv Files.

 

Figure 1 illustrates how our example directory looks like. Now let’s import and combine these data sets in RStudio…

 

Import & Load csv Files in R

We need three R add-on packages for the following R syntax: dplyr, plyr, and readr. Let’s install and load the packages to R.

install.packages("dplyr")                                         # Install dplyr package
install.packages("plyr")                                          # Install plyr package
install.packages("readr")                                         # Install readr package
 
library("dplyr")                                                  # Load dplyr package
library("plyr")                                                   # Load plyr package
library("readr")                                                  # Load readr package

Now, we can import and merge the example csv files based on the list.files, lapply, read_csv, and bind_rows functions:

data_all <- list.files(path = "C:/...Your Path.../my_folder",     # Identify all csv files in folder
                       pattern = "*.csv", full.names = TRUE) %>% 
  lapply(read_csv) %>%                                            # Store all files in list
  bind_rows                                                       # Combine data sets into one data set 
data_all                                                          # Print data to RStudio console

 

tibble with merged data files r

Table 1: Tibble Containing Three Data Sets.

 

Table 1 shows the output of the previous R code. As you can see, our three data sets were combined in a single data set. Data set cells were set to NA, in case a variable was not included in all data sets.

Note that our previous R syntax created a tibble instead of a data frame. In case you prefer to work with data frames, you could simply convert this tibble to a data frame as follows:

as.data.frame(data_all)                                           # Convert tibble to data.frame

 

Video & Further Resources

Have a look at the following video that I have recently released on my Statistics Globe YouTube channel. I illustrate the contents of this article in more detail in the video.

 

The YouTube video will be added soon.

 

Please also note that the previous R code merged our data files by just column binding the different data sources to each other. However, it would also be possible to merge our data sets by a shared column name in order to avoid duplicated observations (i.e. our data sets contain an ID column). You could learn more about this type of merging HERE.

In addition, I can recommend to read the other articles on this homepage.

 

On this page I showed you how to combine all csv files in a folder in the R programming language. Let me know in the comments section, in case you have further questions.

 

Subscribe to the Statistics Globe Newsletter

Get regular updates on the latest tutorials, offers & news at Statistics Globe.
I hate spam & you may opt out anytime: Privacy Policy.


12 Comments. Leave new

  • Fernando Camargo
    October 7, 2020 8:08 pm

    Hi Joachim,

    First of all, thank you for the content.

    I have tried to use this content to sort out one task, but no success.

    Scenario:

    I have a few csv files in a folder, I want to merge all of them using an apply function.

    Step 1 – I tried first listing all the files from a folder.

    Step 2 – Save all csv files from the step 1 to the work directory using an apply function. ( failed )

    Step 3 – Merge all the files.

    This is an easy process in Power BI using power query, I am trying to reproduce it using R.

    Reply
    • Hey Fernando,

      You are very welcome, thank you for the kind words!

      Have you tried the example code of this tutorial? What kind of error message did you get? I need some more details 🙂

      Regards,

      Joachim

      Reply
  • This is amazing, thank you so very much!

    Reply
  • Hello,

    Thank you for this.
    I tried your suggestion and what if I have one identical column for each file and I want all my files ordered by this identical column as with ‘by’ in merge (merge(data_1, data_2, by=c(“participant”, “X1”)))

    Reply
  • Hello Joachim,

    Thank you for your answer but to merge 2 files with merge function, I can define a unique column as a reference for the merge (merge(data_1, data_2, by=c(“participant”, “X1”))). Here my 2 files will be merge according 2 specific and unique columns (with no duplication of my columns participant and X1). If I want to merge serveral csv with each a unique column that can give me the right association between all my file, is it possible ? I take a look on your example 5 (https://statisticsglobe.com/sort-order-rank-r-function-example) but it does not seem to answer my question. Until now, I am forced to use merge in an incremental way to merge all my csv because I specifically need the right association between my files with columns participant and X1.

    Reply
    • Hey Alex,

      Thank you for the follow up.

      Does this (or something similar to this) work?

      all_files <- list.files()
      data_all <- read.csv(all_files[1])
      for(i in 2:length(all_files)) {
        data_i <- read.csv(all_files[i])
        data_all <- merge(data_all, data_i, by = c("participant", "X1"))
      }

      Regards,

      Joachim

      Reply
  • Joachim,

    The csv files in my directory have the following problem:
    for data1, variable x1, the class is numeric.
    for data2, variable x1, the class is character.

    example
    data1 <- data.frame(id = 1:6, # Create first example data frame
    x1 = c(5, 1, 4, 9, 1, 2),
    x2 = c("A", "Y", "G", "F", "G", "Y"))

    data2 <- data.frame(id = 4:9, # Create second example data frame
    x1 = c("none", "none", "none", "none", "none", "none"),
    y1 = c(3, 3, 4, 1, 2, 9),
    y2 = c("a", "x", "a", "x", "a", "x"))

    I want the new, combined dataset to have variable x1 look like this:
    x1 = c(5, 1, 4, 9, 1, 2, NA, NA, NA, NA, NA, NA)
    instead I get an error:
    Error: Can't combine `x1` and `x1` .

    Where/how in your series of pipe operators could I use the function as.numeric() on x1 variable – for all datasets in the list?
    or na.strings = c(“none”)?

    Thank you,
    Tommy

    Reply
    • Hi Tommy,

      In this case I would do this:

      1) Import the two data frames one-by-one
      2) Harmonize data frame classes using the as.numeric function
      3) Merge data frames

      I hope that helps!

      Joachim

      Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.

Menu
Top