Merge Two Unequal Data Frames & Replace NA with 0 in R (Example)
In this tutorial, I’ll show how to join two unequal data frames and replace missing values by zero in R.
The page will consist of the following topics:
It’s time to dive into the R syntax…
Exemplifying Data
The data below is used as basement for this R programming language tutorial:
data1 <- data.frame(id = 1:5, # Create first data frame x1 = 5:9, x2 = 5:1) data1 # Print data # id x1 x2 # 1 1 5 5 # 2 2 6 4 # 3 3 7 3 # 4 4 8 2 # 5 5 9 1 data2 <- data.frame(id = 3:7, # Create second data frame y1 = 20:24, y2 = 10:14) data2 # Print data # id y1 y2 # 1 3 20 10 # 2 4 21 11 # 3 5 22 12 # 4 6 23 13 # 5 7 24 14
Have a look at the previous output of the RStudio console. It shows that our example data frames both consist of three columns, whereby each of them has an ID variable. However, you can also see that the IDs are not equal in the two data frames.
Example: Merging Data & Replacing NA with Zero
In this Example, I’ll show how to combine two unequal data frames and how to replace occurring NA values with 0.
First, we are merging the two data frames together:
data_all <- merge(data1, data2, # Merge data by = "id", all = TRUE) data_all # Print data # id x1 x2 y1 y2 # 1 1 5 5 NA NA # 2 2 6 4 NA NA # 3 3 7 3 20 10 # 4 4 8 2 21 11 # 5 5 9 1 22 12 # 6 6 NA NA 23 13 # 7 7 NA NA 24 14
As you can see based on the previous output, we created a merge of our two input data sets. However, some of the cells of the merged data are NA. We can now replace these missing values with zero:
data_all[is.na(data_all)] <- 0 # Replace NA with 0 data_all # Print data # id x1 x2 y1 y2 # 1 1 5 5 0 0 # 2 2 6 4 0 0 # 3 3 7 3 20 10 # 4 4 8 2 21 11 # 5 5 9 1 22 12 # 6 6 0 0 23 13 # 7 7 0 0 24 14
Looks good! But note that such a replacement should only be done with theoretical justification. Otherwise the results created based on the merged data may be biased.
Video, Further Resources & Summary
Would you like to know more about the merging of data frames? Then you might want to watch the following video which I have published on my YouTube channel. In the video, I’m illustrating the examples of this tutorial in RStudio.
Please accept YouTube cookies to play this video. By accepting you will be accessing content from YouTube, a service provided by an external third party.
If you accept this notice, your choice will be saved and the page will refresh.
In addition, you may want to have a look at some of the related articles of https://www.statisticsglobe.com/.
- Merge Data Frames by Column Names
- Merge Data Frames by Row Names
- Merge Multiple Data Frames in List
- Merge Time Series in R
- The R Programming Language
Summary: This tutorial showed how to merge and replace multiple data tables in the R programming language. Please tell me about it in the comments, in case you have additional questions.
Statistics Globe Newsletter
4 Comments. Leave new
Hi, Joachim. You’re doing a great job. I have a question about merging data. I have two datasets with different sample names (IDs). Some variables are present in both datasets, others only in one, which means I have to merge by rows and columns. When I use this approach, I always get duplicate columns if the variable x is present in both datasets. Is there a solution how to merge such datasets?
I think full_join should work.
Thanks anyway.
Hey Leon,
Thanks for sharing your solution!
Regards,
Joachim
Hey Leon,
Thanks for the kind feedback!
I’m glad you found a solution in the meantime!
Regards,
Joachim