Unique Rows of Data Frame Based On Selected Columns in R (Example)

 

This tutorial explains how to extract certain rows of a data frame where specific columns are duplicated in the R programming language.

The content looks as follows:

Let’s dig in.

 

Example Data

Have a look at the following example data:

data <- data.frame(id1 = c(1, 1, 1, 2, 2, 3),              # Example data
                   id2 = c(1, 1, 2, 2, 2, 4),
                   x = letters[1:6])
data                                                       # Print example data
#   id1 id2 x
# 1   1   1 a
# 2   1   1 b
# 3   1   2 c
# 4   2   2 d
# 5   2   2 e
# 6   3   4 f

Have a look at the previous RStudio console output. It illustrates that our example data contains six rows and three columns. Two of the variables are IDs and one of the variables contains some randomly chosen character values.

 

Example: Removing Rows Duplicated in Certain Variables

Let’s assume that we want to keep only rows that are unique in the two ID columns. Then, we can use the duplicated function as shown below:

data_new <- data[!duplicated(data[ , c("id1", "id2")]), ]  # Delete rows
data_new                                                   # Print new data
#   id1 id2 x
# 1   1   1 a
# 3   1   2 c
# 4   2   2 d
# 6   3   4 f

As you can see, we retained only unique lines of our input data matrix.

 

Video & Further Resources

I have recently published a video on my YouTube channel, which illustrates the R codes of this tutorial. You can find the video below:

 

The YouTube video will be added soon.

 

Furthermore, you might read the other tutorials which I have published on www.statisticsglobe.com.

 

In this R tutorial you learned how to remove duplicates in specific columns and how to filter for unique combinations. Don’t hesitate to let me know in the comments, in case you have any additional questions.

 

Subscribe to the Statistics Globe Newsletter

Get regular updates on the latest tutorials, offers & news at Statistics Globe.
I hate spam & you may opt out anytime: Privacy Policy.


2 Comments. Leave new

  • Thanks Joachim.
    Why did it choose to keep rows 1 and 4 instead of rows 2 and 5, respectively?
    Keeping rows 2 and 5 would also have removed duplicates, but results in column x would differ.

    Reply
    • Hi Jeff,

      This is the default specification of the unique function. Please have a look here for more details on how to keep the last occurrence of a duplicate value.

      Regards,
      Joachim

      Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.

Top