Merge Two Unequal Data Frames & Replace NA with 0 in R (Example)

 

In this tutorial, I’ll show how to join two unequal data frames and replace missing values by zero in R.

The page will consist of the following topics:

It’s time to dive into the R syntax…

 

Exemplifying Data

The data below is used as basement for this R programming language tutorial:

data1 <- data.frame(id = 1:5,      # Create first data frame
                    x1 = 5:9,
                    x2 = 5:1)
data1                              # Print data
#   id x1 x2
# 1  1  5  5
# 2  2  6  4
# 3  3  7  3
# 4  4  8  2
# 5  5  9  1
data2 <- data.frame(id = 3:7,      # Create second data frame
                    y1 = 20:24,
                    y2 = 10:14)
data2                              # Print data
#   id y1 y2
# 1  3 20 10
# 2  4 21 11
# 3  5 22 12
# 4  6 23 13
# 5  7 24 14

Have a look at the previous output of the RStudio console. It shows that our example data frames both consist of three columns, whereby each of them has an ID variable. However, you can also see that the IDs are not equal in the two data frames.

 

Example: Merging Data & Replacing NA with Zero

In this Example, I’ll show how to combine two unequal data frames and how to replace occurring NA values with 0.

First, we are merging the two data frames together:

data_all <- merge(data1, data2,    # Merge data
                  by = "id",
                  all = TRUE)
data_all                           # Print data
#   id x1 x2 y1 y2
# 1  1  5  5 NA NA
# 2  2  6  4 NA NA
# 3  3  7  3 20 10
# 4  4  8  2 21 11
# 5  5  9  1 22 12
# 6  6 NA NA 23 13
# 7  7 NA NA 24 14

As you can see based on the previous output, we created a merge of our two input data sets. However, some of the cells of the merged data are NA. We can now replace these missing values with zero:

data_all[is.na(data_all)] <- 0     # Replace NA with 0
data_all                           # Print data
#   id x1 x2 y1 y2
# 1  1  5  5  0  0
# 2  2  6  4  0  0
# 3  3  7  3 20 10
# 4  4  8  2 21 11
# 5  5  9  1 22 12
# 6  6  0  0 23 13
# 7  7  0  0 24 14

Looks good! But note that such a replacement should only be done with theoretical justification. Otherwise the results created based on the merged data may be biased.

 

Video, Further Resources & Summary

Would you like to know more about the merging of data frames? Then you might want to watch the following video which I have published on my YouTube channel. In the video, I’m illustrating the examples of this tutorial in RStudio.

 

The YouTube video will be added soon.

 

In addition, you may want to have a look at some of the related articles of https://www.statisticsglobe.com/.

 

Summary: This tutorial showed how to merge and replace multiple data tables in the R programming language. Please tell me about it in the comments, in case you have additional questions.

 



Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.

Menu
Top