Unique Rows of Data Frame Based On Selected Columns in R (Example)

This tutorial explains how to extract certain rows of a data frame where specific columns are duplicated in the R programming language.

The content looks as follows:

1) Example Data

2) Example: Removing Rows Duplicated in Certain Variables

3) Video & Further Resources

Let’s dig in.

Example Data

Have a look at the following example data:

data <- data.frame(id1 = c(1, 1, 1, 2, 2, 3),              # Example data
                   id2 = c(1, 1, 2, 2, 2, 4),
                   x = letters[1:6])
data                                                       # Print example data
#   id1 id2 x
# 1   1   1 a
# 2   1   1 b
# 3   1   2 c
# 4   2   2 d
# 5   2   2 e
# 6   3   4 f

Have a look at the previous RStudio console output. It illustrates that our example data contains six rows and three columns. Two of the variables are IDs and one of the variables contains some randomly chosen character values.

Example: Removing Rows Duplicated in Certain Variables

Let’s assume that we want to keep only rows that are unique in the two ID columns. Then, we can use the duplicated function as shown below:

data_new <- data[!duplicated(data[ , c("id1", "id2")]), ]  # Delete rows
data_new                                                   # Print new data
#   id1 id2 x
# 1   1   1 a
# 3   1   2 c
# 4   2   2 d
# 6   3   4 f

As you can see, we retained only unique lines of our input data matrix.

Video & Further Resources

I have recently published a video on my YouTube channel, which illustrates the R codes of this tutorial. You can find the video below:

The YouTube video will be added soon.

Furthermore, you might read the other tutorials which I have published on www.statisticsglobe.com.

In this R tutorial you learned how to remove duplicates in specific columns and how to filter for unique combinations. Don’t hesitate to let me know in the comments, in case you have any additional questions.

2 Comments. Leave new

Jeff Norriss
September 13, 2022 3:26 am

Thanks Joachim.
Why did it choose to keep rows 1 and 4 instead of rows 2 and 5, respectively?
Keeping rows 2 and 5 would also have removed duplicates, but results in column x would differ.

Reply
- Joachim
  September 19, 2022 11:12 am
  
  Hi Jeff,
  
  This is the default specification of the unique function. Please have a look here for more details on how to keep the last occurrence of a duplicate value.
  
  Regards,
  Joachim
  
  Reply