Delete Duplicate Rows Based On Column Values in R (Example)

 

In this article, I’ll demonstrate how to extract unique rows based on a logical condition in R.

The content of the page is structured as follows:

So without further additions, let’s just jump right in!

 

Creation of Exemplifying Data

The first step is to define some data that we can use in the examples below:

data <- data.frame(x = c("a", "a", "a", "a", "b", "b", "c"),  # Create example data
                   y = c(3, 1, 4, 2, 1, 2, 1))
data                                                          # Print example data

 

table 1 data frame delete duplicate rows based on column values r

 

Table 1 illustrates the output of the RStudio console that got returned after executing the previous R syntax and shows that our example data has seven data points and two columns called “x” and “y”.

As you can see, the variable x contains several duplicates. We may remove the duplicates from our data as shown below:

data_default <- data[!duplicated(data$x), ]                   # Extract unique rows
data_default                                                  # Print data with unique rows

 

table 2 data frame delete duplicate rows based on column values r

 

As shown in Table 2, the previous code has created a data set containing each value in the column x only once.

However, you can also see that we have returned the first row of each value in x. In the following example, I’ll explain how to extract rows conditionally based on another column.

 

Example: Subset of Unique Rows with Highest Values in Column

The following R programming syntax explains how to return the highest value in y for each duplicate in x.

To do this, we first have to order our data set in decreasing order according to the column y:

data_ordered <- data[order(data$y, decreasing = TRUE), ]      # Order data
data_ordered                                                  # Print ordered data

 

table 3 data frame delete duplicate rows based on column values r

 

The output of the previous R code is shown in Table 3 – A data frame sorted according to the variable y.

In the next step, we can delete all duplicates from our data:

data_highest <- data_ordered[!duplicated(data_ordered$x), ]   # Unique rows of ordered data
data_highest                                                  # Print unique rows of ordered data

 

table 4 data frame delete duplicate rows based on column values r

 

Table 4 shows the output of the previous R syntax: A data frame subset with unique elements in x and the highest corresponding values in y.

 

Video, Further Resources & Summary

I have recently published a video on my YouTube channel, which demonstrates the R programming codes of this tutorial. Please find the video below.

 

 

Furthermore, you may read the other articles that I have published on this homepage.

 

In this R tutorial you have learned how to select unique rows based on a logical condition. Don’t hesitate to let me know in the comments section, in case you have additional questions.

 

Subscribe to the Statistics Globe Newsletter

Get regular updates on the latest tutorials, offers & news at Statistics Globe.
I hate spam & you may opt out anytime: Privacy Policy.


Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.

Top