Remove Duplicated Rows from Data Frame in R (Example)
This tutorial illustrates how to eliminate duplicated rows from a data frame in R programming.
Table of contents:
Let’s dig in…
Creation of Example Data
In the example of this R tutorial, we’ll use the following data frame in R:
data <- data.frame(x1 = c(1:5, 2, 5), # Create example data x2 = c(letters[1:5], "b", "e")) data # Print example data # x1 x2 # 1 a # 2 b # 3 c # 4 d # 5 e # 2 b # 5 e
The RStudio console output is illustrating the structure of our data. Our data frame consists of seven rows and two columns, whereby rows 1 and 2 are duplicated in rows 6 and 7.
Example: Delete Duplicated Rows from Data Frame
If we want to remove repeated rows from our example data, we can use the duplicated() R function. The duplicated function returns a logical vector, identifying duplicated rows with a TRUE or FALSE. By putting a bang (i.e. !) in front of the duplicated command, we can subset our data so that only unique rows remain:
data_unique <- data[!duplicated(data), ] # Remove duplicated rows data_unique # Print unique data # x1 x2 # 1 a # 2 b # 3 c # 4 d # 5 e
As you can see based on the previous output of the RStudio console, only the five unique rows were obtained in our data. Rows 6 and seven were removed.
Video & Further Resources
Would you like to know more about the removal of replicated rows in a data matrix? Then you might watch the following video of my YouTube channel. I show the R programming codes of this tutorial in the video and explain how to find and remove duplicates in some more detail:
Please accept YouTube cookies to play this video. By accepting you will be accessing content from YouTube, a service provided by an external third party.
If you accept this notice, your choice will be saved and the page will refresh.
In addition, you might read the other posts of my website. A selection of related articles is listed here:
- Remove Duplicates with dplyr Package
- Subset Data Frame Rows by Logical Condition in R
- unique Function in R
- The R Programming Language
Summary: At this point of the tutorial you should have learned how to identify and remove duplicate rows that are repeated multiple times in the R programming language. Let me know in the comments section below, in case you have any further questions.
Statistics Globe Newsletter