Find Rows in First Data Frame that are not in Second in R (Example)

 

In this R tutorial you’ll learn how to identify rows that are only in the first data frame, but not in the second data frame.

The article will contain this information:

You’re here for the answer, so let’s get straight to the R code.

 

Constructing Example Data

The following data will be used as major data frame for this R programming language tutorial:

data1 <- data.frame(x1 = 1:5,          # Create first data frame
                    x2 = letters[1:5])
data1                                  # Print first data frame
#   x1 x2
# 1  1  a
# 2  2  b
# 3  3  c
# 4  4  d
# 5  5  e

As you can see based on the previous output of the RStudio console, our first exemplifying data consists of five rows and two columns. The variable x1 is numeric and ranges from 1 to 5 and the second variable x2 contains alphabetical letters ranging from a to e.

Let’s create a second data frame in R:

data2 <- data.frame(x1 = 3:6,          # Create second data frame
                    x2 = letters[3:6])
data2                                  # Print second data frame
#   x1 x2
# 1  3  c
# 2  4  d
# 3  5  e
# 4  6  f

The second data frame also consists of the two variables x1 and x2. However, this data frame contains not all rows that are contained in the first data frame.

 

Example 1: Returning Rows that Only Exist in First Data Frame Using dplyr Package

The following code illustrates how to find those rows that are only contained in data frame 1, but not in data frame 2.

In this example, we’ll use the dplyr package. If we want to use the functions and commands of the dplyr package, we first need to install and load dplyr:

install.packages("dplyr")              # Install & load dplyr package
library("dplyr")

Now, we can use the setdiff function provided by the dplyr package:

setdiff(data1, data2)                  # Applying setdiff function
#   x1 x2
# 1  1  a
# 2  2  b

As you can see based on the previous output of the RStudio console, the first two rows of data frame No. 1 are not contained in data frame No. 2.

Note that the last row of data frame 2 is not contained in data frame 1. However, this row is not shown in the previous output, since we are only interested in rows that are stored in data 1 but not data 2.

 

Video, Further Resources & Summary

Have a look at the following video which I have published on my YouTube channel. In the video, I’m explaining the R programming syntax of this article.

 

 

Furthermore, you might read some of the other tutorials of this website. Some articles about similar topics such as vectors, data inspection, and extracting data can be found below:

 

In this R tutorial you learned how to filter and extract rows that are only present in data frame No. 1. In case you have any further questions, let me know in the comments below.

 

Subscribe to the Statistics Globe Newsletter

Get regular updates on the latest tutorials, offers & news at Statistics Globe.
I hate spam & you may opt out anytime: Privacy Policy.


Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.

Top