Apply unique Function to Multiple Columns in R (2 Examples)

On this page you’ll learn how to retain only data frame rows that are unique in certain variables in the R programming language.

The article looks as follows:

1) Creation of Example Data

2) Example 1: Select Unique Data Frame Rows Using unique() Function

3) Example 2: Select Unique Data Frame Rows Using duplicated() Function

4) Video, Further Resources & Summary

5) Subscribe to the Statistics Globe Newsletter

6) Thank you!

Let’s just jump right in:

Creation of Example Data

As a first step, we have to construct some example data:

data <- data.frame(x1 = c(1, 1, 1, 2, 2, 2),                    # Creating example data
                   x2 = c("a", "a", "b", "b", "b", "c"),
                   x3 = LETTERS[1:6])
data                                                            # Printing example data
#   x1 x2 x3
# 1  1  a  A
# 2  1  a  B
# 3  1  b  C
# 4  2  b  D
# 5  2  b  E
# 6  2  c  F

The previous output of the RStudio console shows that our example data has six rows and three columns. The variables x1 and x2 are duplicated in some rows.

Example 1: Select Unique Data Frame Rows Using unique() Function

In this example, I’ll show how to apply the unique function based on multiple variables of our example data frame.

Have a look at the following R code and its output:

data_unique <- unique(data[ , c("x1", "x2")])                   # Apply unique
data_unique                                                     # Print new data frame
#   x1 x2
# 1  1  a
# 3  1  b
# 4  2  b
# 6  2  c

The previous R syntax created a new data frame that only consists of unique rows in the variables x1 and x2.

It is important to note that the previous R code also deleted the column x3. In the next example, I’ll explain how to keep the variable x3 in our output data matrix…

Example 2: Select Unique Data Frame Rows Using duplicated() Function

In Example 2, I’ll illustrate how to use the duplicated function to create a data frame subset that is unique in specific columns. Consider the following R code:

data_duplicated <- data[!duplicated(data[ , c("x1", "x2")]), ]  # Apply duplicated
data_duplicated                                                 # Print new data frame
#   x1 x2 x3
# 1  1  a  A
# 3  1  b  C
# 4  2  b  D
# 6  2  c  F

As you can see, the retained rows are the same as in Example 1. However, this time we also kept the variable x3 that was not used for the identification of unique rows.

Video, Further Resources & Summary

I have recently released a video on my YouTube channel, which illustrates the R programming code of the present tutorial. You can find the video below.

In addition to the video, I can recommend to read the related tutorials on my website. I have released several tutorials already:

In this R tutorial you learned how to keep only data frame rows that are not duplicated in particular columns. Let me know in the comments section below, in case you have additional questions. Furthermore, please subscribe to my email newsletter to receive regular updates on new posts.