Merge Two Unequal Data Frames & Replace NA with 0 in R (Example)

 

In this tutorial, I’ll show how to join two unequal data frames and replace missing values by zero in R.

The page will consist of the following topics:

It’s time to dive into the R syntax…

 

Exemplifying Data

The data below is used as basement for this R programming language tutorial:

data1 <- data.frame(id = 1:5,      # Create first data frame
                    x1 = 5:9,
                    x2 = 5:1)
data1                              # Print data
#   id x1 x2
# 1  1  5  5
# 2  2  6  4
# 3  3  7  3
# 4  4  8  2
# 5  5  9  1
data2 <- data.frame(id = 3:7,      # Create second data frame
                    y1 = 20:24,
                    y2 = 10:14)
data2                              # Print data
#   id y1 y2
# 1  3 20 10
# 2  4 21 11
# 3  5 22 12
# 4  6 23 13
# 5  7 24 14

Have a look at the previous output of the RStudio console. It shows that our example data frames both consist of three columns, whereby each of them has an ID variable. However, you can also see that the IDs are not equal in the two data frames.

 

Example: Merging Data & Replacing NA with Zero

In this Example, I’ll show how to combine two unequal data frames and how to replace occurring NA values with 0.

First, we are merging the two data frames together:

data_all <- merge(data1, data2,    # Merge data
                  by = "id",
                  all = TRUE)
data_all                           # Print data
#   id x1 x2 y1 y2
# 1  1  5  5 NA NA
# 2  2  6  4 NA NA
# 3  3  7  3 20 10
# 4  4  8  2 21 11
# 5  5  9  1 22 12
# 6  6 NA NA 23 13
# 7  7 NA NA 24 14

As you can see based on the previous output, we created a merge of our two input data sets. However, some of the cells of the merged data are NA. We can now replace these missing values with zero:

data_all[is.na(data_all)] <- 0     # Replace NA with 0
data_all                           # Print data
#   id x1 x2 y1 y2
# 1  1  5  5  0  0
# 2  2  6  4  0  0
# 3  3  7  3 20 10
# 4  4  8  2 21 11
# 5  5  9  1 22 12
# 6  6  0  0 23 13
# 7  7  0  0 24 14

Looks good! But note that such a replacement should only be done with theoretical justification. Otherwise the results created based on the merged data may be biased.

 

Video, Further Resources & Summary

Would you like to know more about the merging of data frames? Then you might want to watch the following video which I have published on my YouTube channel. In the video, I’m illustrating the examples of this tutorial in RStudio.

 

 

In addition, you may want to have a look at some of the related articles of https://www.statisticsglobe.com/.

 

Summary: This tutorial showed how to merge and replace multiple data tables in the R programming language. Please tell me about it in the comments, in case you have additional questions.

 

Subscribe to the Statistics Globe Newsletter

Get regular updates on the latest tutorials, offers & news at Statistics Globe.
I hate spam & you may opt out anytime: Privacy Policy.


4 Comments. Leave new

  • Hi, Joachim. You’re doing a great job. I have a question about merging data. I have two datasets with different sample names (IDs). Some variables are present in both datasets, others only in one, which means I have to merge by rows and columns. When I use this approach, I always get duplicate columns if the variable x is present in both datasets. Is there a solution how to merge such datasets?

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.

Top