Import & Merge Multiple CSV Files in R (2 Examples)
In this article, I’ll show you how to import and merge CSV files in the R programming language.
The page will contain the following topics:
Let’s do this!
Before we can start with the examples, we need to create an exemplifying directory including multiple CSV files. First, we need to create several data frames…
data1 <- data.frame(id = 1:6, # Create first example data frame x1 = c(5, 1, 4, 9, 1, 2), x2 = c("A", "Y", "G", "F", "G", "Y")) data2 <- data.frame(id = 4:9, # Create second example data frame y1 = c(3, 3, 4, 1, 2, 9), y2 = c("a", "x", "a", "x", "a", "x")) data3 <- data.frame(id = 5:6, # Create third example data frame z1 = c(3, 2), z2 = c("K", "b"))
…and then we need to export these data frames as CSV files to our computer:
write.csv(data1, "C:/Users/Joach/Desktop/my_folder/data1.csv", # Write first example data row.names = FALSE) write.csv(data2, "C:/Users/Joach/Desktop/my_folder/data2.csv", # Write second example data row.names = FALSE) write.csv(data3, "C:/Users/Joach/Desktop/my_folder/data3.csv", # Write third example data row.names = FALSE)
Figure 1: Exemplifying Directory with CSV Files.
Figure 1 illustrates how our example directory looks like. Now let’s import and combine these data sets in RStudio…
Example 1: Import & Row-Bind CSV Files in R
install.packages("dplyr") # Install dplyr package install.packages("plyr") # Install plyr package install.packages("readr") # Install readr package library("dplyr") # Load dplyr package library("plyr") # Load plyr package library("readr") # Load readr package
data_all <- list.files(path = "C:/Users/Joach/Desktop/my_folder", # Identify all CSV files pattern = "*.csv", full.names = TRUE) %>% lapply(read_csv) %>% # Store all files in list bind_rows # Combine data sets into one data set data_all # Print data to RStudio console
Table 1: Tibble Containing Three Data Sets.
Table 1 shows the output of the previous R code. As you can see, our three data sets were combined vertically in a single data set. Data set cells were set to NA, in case a variable was not included in all data sets.
Note that our previous R syntax created a tibble instead of a data frame. In case you prefer to work with data frames, you could simply convert this tibble to a data frame as follows:
as.data.frame(data_all) # Convert tibble to data.frame
Table 2: Convert Tibble to Data Frame.
You may have noticed that we have simply stacked the rows of our three data frames on top of each other.
In the next example, I’ll explain how to merge our data frames based on a shared ID variable, so keep on reading!
Example 2: Import & Join CSV Files in R
As you can see in the previous tables, all of our example data frames contain an id column. In this example, we’ll use this id column to merge our data (instead of just row-binding it as in Example 1).
For this task, we first have to install and load the purrr package:
install.packages("purrr") # Install & load purrr package library("purrr")
Next, we can apply the reduce and full_join functions to join our data frames based in the id variables:
data_join <- list.files(path = "C:/Users/Joach/Desktop/my_folder", # Identify all CSV files pattern = "*.csv", full.names = TRUE) %>% lapply(read_csv) %>% # Store all files in list reduce(full_join, by = "id") # Full-join data sets into one data set data_join # Print data to RStudio console
Table 3: Tibble Containing Three Data Sets Merged by ID Column.
As you can see based on Table 3, we have created another tibble containing our three data sets. However, in contrast to the previous example, we have joined our data sets based on the id column, and hence have avoided duplicated observations in our final data set.
Note that we have used a full join, to combine our data frames. In case you want to learn more about different types of joins, you may have a look here.
Video & Further Resources
Have a look at the following video that I have recently released on my Statistics Globe YouTube channel. I illustrate the contents of this article in more detail in the video.
Please accept YouTube cookies to play this video. By accepting you will be accessing content from YouTube, a service provided by an external third party.
If you accept this notice, your choice will be saved and the page will refresh.
In addition, I can recommend reading the other articles on this homepage.
- Read XLSX and XLS Files to R
- The dir Function in R
- Write & Read Multiple CSV Files Using for-Loop
- R Functions List (+ Examples)
- The R Programming Language
On this page I showed you how to combine all CSV files in a folder in the R programming language. Let me know in the comments section, in case you have further questions.