Import & Merge Multiple csv Files in R (Example)
In this article, I’ll show you how to import and merge csv files in the R programming language.
The page will contain the following topics:
Let’s do this!
Exemplifying Data
Before we can start with the example, we need to create an exemplifying directory including multiple csv files. First, we need to create several data frames…
data1 <- data.frame(id = 1:6, # Create first example data frame x1 = c(5, 1, 4, 9, 1, 2), x2 = c("A", "Y", "G", "F", "G", "Y")) data2 <- data.frame(id = 4:9, # Create second example data frame y1 = c(3, 3, 4, 1, 2, 9), y2 = c("a", "x", "a", "x", "a", "x")) data3 <- data.frame(id = 5:6, # Create third example data frame z1 = c(3, 2), z2 = c("K", "b")) |
data1 <- data.frame(id = 1:6, # Create first example data frame x1 = c(5, 1, 4, 9, 1, 2), x2 = c("A", "Y", "G", "F", "G", "Y")) data2 <- data.frame(id = 4:9, # Create second example data frame y1 = c(3, 3, 4, 1, 2, 9), y2 = c("a", "x", "a", "x", "a", "x")) data3 <- data.frame(id = 5:6, # Create third example data frame z1 = c(3, 2), z2 = c("K", "b"))
…and then we need to export these data frames as csv files to our computer:
write.csv(data1, "C:/...Your Path.../my_folder/data1.csv") # Write first example data frame write.csv(data2, "C:/...Your Path.../my_folder/data2.csv") # Write second example data frame write.csv(data3, "C:/...Your Path.../my_folder/data3.csv") # Write third example data frame |
write.csv(data1, "C:/...Your Path.../my_folder/data1.csv") # Write first example data frame write.csv(data2, "C:/...Your Path.../my_folder/data2.csv") # Write second example data frame write.csv(data3, "C:/...Your Path.../my_folder/data3.csv") # Write third example data frame
Figure 1: Exemplifying Directory with csv Files.
Figure 1 illustrates how our example directory looks like. Now let’s import and combine these data sets in RStudio…
Import & Load csv Files in R
We need three R add-on packages for the following R syntax: dplyr, plyr, and readr. Let’s install and load the packages to R.
install.packages("dplyr") # Install dplyr package install.packages("plyr") # Install plyr package install.packages("readr") # Install readr package library("dplyr") # Load dplyr package library("plyr") # Load plyr package library("readr") # Load readr package |
install.packages("dplyr") # Install dplyr package install.packages("plyr") # Install plyr package install.packages("readr") # Install readr package library("dplyr") # Load dplyr package library("plyr") # Load plyr package library("readr") # Load readr package
Now, we can import and merge the example csv files based on the list.files, lapply, read_csv, and bind_rows functions:
data_all <- list.files(path = "C:/...Your Path.../my_folder", # Identify all csv files in folder pattern = "*.csv", full.names = TRUE) %>% lapply(read_csv) %>% # Store all files in list bind_rows # Combine data sets into one data set data_all # Print data to RStudio console |
data_all <- list.files(path = "C:/...Your Path.../my_folder", # Identify all csv files in folder pattern = "*.csv", full.names = TRUE) %>% lapply(read_csv) %>% # Store all files in list bind_rows # Combine data sets into one data set data_all # Print data to RStudio console
Table 1: Tibble Containing Three Data Sets.
Table 1 shows the output of the previous R code. As you can see, our three data sets were combined in a single data set. Data set cells were set to NA, in case a variable was not included in all data sets.
Note that our previous R syntax created a tibble instead of a data frame. In case you prefer to work with data frames, you could simply convert this tibble to a data frame as follows:
as.data.frame(data_all) # Convert tibble to data.frame |
as.data.frame(data_all) # Convert tibble to data.frame
Video & Further Resources
Have a look at the following video that I have recently released on my Statistics Globe YouTube channel. I illustrate the contents of this article in more detail in the video.
The YouTube video will be added soon.
Please also note that the previous R code merged our data files by just column binding the different data sources to each other. However, it would also be possible to merge our data sets by a shared column name in order to avoid duplicated observations (i.e. our data sets contain an ID column). You could learn more about this type of merging HERE.
In addition, I can recommend to read the other articles on this homepage.
- Read XLSX and XLS Files to R
- The dir Function in R
- Write & Read Multiple CSV Files Using for-Loop
- R Functions List (+ Examples)
- The R Programming Language
On this page I showed you how to combine all csv files in a folder in the R programming language. Let me know in the comments section, in case you have further questions.
7 Comments. Leave new
Hi Joachim,
First of all, thank you for the content.
I have tried to use this content to sort out one task, but no success.
Scenario:
I have a few csv files in a folder, I want to merge all of them using an apply function.
Step 1 – I tried first listing all the files from a folder.
Step 2 – Save all csv files from the step 1 to the work directory using an apply function. ( failed )
Step 3 – Merge all the files.
This is an easy process in Power BI using power query, I am trying to reproduce it using R.
Hey Fernando,
You are very welcome, thank you for the kind words!
Have you tried the example code of this tutorial? What kind of error message did you get? I need some more details 🙂
Regards,
Joachim
This is amazing, thank you so very much!
Thanks Lou, I’m glad to hear this!
Hello,
Thank you for this.
I tried your suggestion and what if I have one identical column for each file and I want all my files ordered by this identical column as with ‘by’ in merge (merge(data_1, data_2, by=c(“participant”, “X1”)))
Hi Alex,
You are very welcome!
Please have a look at the order function, specifically Example 5 of this tutorial: https://statisticsglobe.com/sort-order-rank-r-function-example
Regards,
Joachim
Hello Joachim,
Thank you for your answer but to merge 2 files with merge function, I can define a unique column as a reference for the merge (merge(data_1, data_2, by=c(“participant”, “X1”))). Here my 2 files will be merge according 2 specific and unique columns (with no duplication of my columns participant and X1). If I want to merge serveral csv with each a unique column that can give me the right association between all my file, is it possible ? I take a look on your example 5 (https://statisticsglobe.com/sort-order-rank-r-function-example) but it does not seem to answer my question. Until now, I am forced to use merge in an incremental way to merge all my csv because I specifically need the right association between my files with columns participant and X1.