Unique Rows of Data Frame Based On Selected Columns in R (Example)
This tutorial explains how to extract certain rows of a data frame where specific columns are duplicated in the R programming language.
The content looks as follows:
Let’s dig in.
Example Data
Have a look at the following example data:
data <- data.frame(id1 = c(1, 1, 1, 2, 2, 3), # Example data id2 = c(1, 1, 2, 2, 2, 4), x = letters[1:6]) data # Print example data # id1 id2 x # 1 1 1 a # 2 1 1 b # 3 1 2 c # 4 2 2 d # 5 2 2 e # 6 3 4 f
Have a look at the previous RStudio console output. It illustrates that our example data contains six rows and three columns. Two of the variables are IDs and one of the variables contains some randomly chosen character values.
Example: Removing Rows Duplicated in Certain Variables
Let’s assume that we want to keep only rows that are unique in the two ID columns. Then, we can use the duplicated function as shown below:
data_new <- data[!duplicated(data[ , c("id1", "id2")]), ] # Delete rows data_new # Print new data # id1 id2 x # 1 1 1 a # 3 1 2 c # 4 2 2 d # 6 3 4 f
As you can see, we retained only unique lines of our input data matrix.
Video & Further Resources
I have recently published a video on my YouTube channel, which illustrates the R codes of this tutorial. You can find the video below:
The YouTube video will be added soon.
Furthermore, you might read the other tutorials which I have published on www.statisticsglobe.com.
- Remove Duplicated Rows from Data Frame
- Repeat Rows of Data Frame N Times
- Select First Row of Each Group in Data Frame
- Remove First Row of Data Frame
- unique Function in R
- The R Programming Language
In this R tutorial you learned how to remove duplicates in specific columns and how to filter for unique combinations. Don’t hesitate to let me know in the comments, in case you have any additional questions.
Statistics Globe Newsletter
2 Comments. Leave new
Thanks Joachim.
Why did it choose to keep rows 1 and 4 instead of rows 2 and 5, respectively?
Keeping rows 2 and 5 would also have removed duplicates, but results in column x would differ.
Hi Jeff,
This is the default specification of the unique function. Please have a look here for more details on how to keep the last occurrence of a duplicate value.
Regards,
Joachim