Merge Two Unequal Data Frames & Replace NA with 0 in R (Example)

In this tutorial, I’ll show how to join two unequal data frames and replace missing values by zero in R.

The page will consist of the following topics:

1) Exemplifying Data

2) Example: Merging Data & Replacing NA with Zero

3) Video, Further Resources & Summary

4) Subscribe to the Statistics Globe Newsletter

5) Thank you!

It’s time to dive into the R syntax…

Exemplifying Data

The data below is used as basement for this R programming language tutorial:

data1 <- data.frame(id = 1:5,      # Create first data frame
                    x1 = 5:9,
                    x2 = 5:1)
data1                              # Print data
#   id x1 x2
# 1  1  5  5
# 2  2  6  4
# 3  3  7  3
# 4  4  8  2
# 5  5  9  1
data2 <- data.frame(id = 3:7,      # Create second data frame
                    y1 = 20:24,
                    y2 = 10:14)
data2                              # Print data
#   id y1 y2
# 1  3 20 10
# 2  4 21 11
# 3  5 22 12
# 4  6 23 13
# 5  7 24 14

Have a look at the previous output of the RStudio console. It shows that our example data frames both consist of three columns, whereby each of them has an ID variable. However, you can also see that the IDs are not equal in the two data frames.

Example: Merging Data & Replacing NA with Zero

In this Example, I’ll show how to combine two unequal data frames and how to replace occurring NA values with 0.

First, we are merging the two data frames together:

data_all <- merge(data1, data2,    # Merge data
                  by = "id",
                  all = TRUE)
data_all                           # Print data
#   id x1 x2 y1 y2
# 1  1  5  5 NA NA
# 2  2  6  4 NA NA
# 3  3  7  3 20 10
# 4  4  8  2 21 11
# 5  5  9  1 22 12
# 6  6 NA NA 23 13
# 7  7 NA NA 24 14

As you can see based on the previous output, we created a merge of our two input data sets. However, some of the cells of the merged data are NA. We can now replace these missing values with zero:

data_all[is.na(data_all)] <- 0     # Replace NA with 0
data_all                           # Print data
#   id x1 x2 y1 y2
# 1  1  5  5  0  0
# 2  2  6  4  0  0
# 3  3  7  3 20 10
# 4  4  8  2 21 11
# 5  5  9  1 22 12
# 6  6  0  0 23 13
# 7  7  0  0 24 14

Looks good! But note that such a replacement should only be done with theoretical justification. Otherwise the results created based on the merged data may be biased.

Video, Further Resources & Summary

Would you like to know more about the merging of data frames? Then you might want to watch the following video which I have published on my YouTube channel. In the video, I’m illustrating the examples of this tutorial in RStudio.

In addition, you may want to have a look at some of the related articles of https://www.statisticsglobe.com/.

Summary: This tutorial showed how to merge and replace multiple data tables in the R programming language. Please tell me about it in the comments, in case you have additional questions.

4 Comments. Leave new

Leon
March 17, 2022 6:35 pm

Hi, Joachim. You’re doing a great job. I have a question about merging data. I have two datasets with different sample names (IDs). Some variables are present in both datasets, others only in one, which means I have to merge by rows and columns. When I use this approach, I always get duplicate columns if the variable x is present in both datasets. Is there a solution how to merge such datasets?

Reply
- Leon
  March 17, 2022 6:56 pm
  
  I think full_join should work.
  Thanks anyway.
  
  Reply
  - Joachim
    March 18, 2022 7:36 am
    
    Hey Leon,
    
    Thanks for sharing your solution!
    
    Regards,
    Joachim
    
    Reply
- Joachim
  March 18, 2022 7:35 am
  
  Hey Leon,
  
  Thanks for the kind feedback!
  
  I’m glad you found a solution in the meantime!
  
  Regards,
  Joachim
  
  Reply