Unique Rows of Data Frame Based On Selected Columns in R (Example)

 

This tutorial explains how to extract certain rows of a data frame where specific columns are duplicated in the R programming language.

The content looks as follows:

Let’s dig in.

 

Example Data

Have a look at the following example data:

data <- data.frame(id1 = c(1, 1, 1, 2, 2, 3),              # Example data
                   id2 = c(1, 1, 2, 2, 2, 4),
                   x = letters[1:6])
data                                                       # Print example data
#   id1 id2 x
# 1   1   1 a
# 2   1   1 b
# 3   1   2 c
# 4   2   2 d
# 5   2   2 e
# 6   3   4 f

Have a look at the previous RStudio console output. It illustrates that our example data contains six rows and three columns. Two of the variables are IDs and one of the variables contains some randomly chosen character values.

 

Example: Removing Rows Duplicated in Certain Variables

Let’s assume that we want to keep only rows that are unique in the two ID columns. Then, we can use the duplicated function as shown below:

data_new <- data[!duplicated(data[ , c("id1", "id2")]), ]  # Delete rows
data_new                                                   # Print new data
#   id1 id2 x
# 1   1   1 a
# 3   1   2 c
# 4   2   2 d
# 6   3   4 f

As you can see, we retained only unique lines of our input data matrix.

 

Video & Further Resources

I have recently published a video on my YouTube channel, which illustrates the R codes of this tutorial. You can find the video below:

 

The YouTube video will be added soon.

 

Furthermore, you might read the other tutorials which I have published on www.statisticsglobe.com.

 

In this R tutorial you learned how to remove duplicates in specific columns. Don’t hesitate to let me know in the comments, in case you have any additional questions.

 



Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.

Menu
Top