distinct R Function of dplyr Package (Example)

 

In this post you’ll learn how to retain only unique rows of a data set with the distinct function of the dplyr package in R.

Table of contents:

Sound good? Here’s how to do it.

 

Creation of Example Data

If we want to apply the distinct function in R, we first need to install and load the dplyr add-on package to RStudio:

install.packages("dplyr")                        # Install and load dplyr
library("dplyr")

Furthermore, we need to create some example data:

data <- data.frame(x1 = c(1:5, 1),               # Create example data
                   x2 = c(letters[1:5], "a"))
data                                             # Print data to RStudio console
#   x1 x2
# 1  1  a
# 2  2  b
# 3  3  c
# 4  4  d
# 5  5  e
# 6  1  a

Our example data is a data.frame with six rows and two columns. The first and sixth row are identical.

Note that we could also use a data set in tibble format. However, in the present R tutorial we’ll stick to a data frame.

 

Example: Remove Duplicate Rows with distinct Function

We can remove duplicate rows from our example data with the distinct function as shown in the following:

distinct(data)                                   # Remove duplicate rows
#   x1 x2
# 1  1  a
# 2  2  b
# 3  3  c
# 4  4  d
# 5  5  e

You can see the output of the distinct function in the RStudio console: The same data frame as before, but this time without the duplicate row six.

 

Video & Further Resources

If you need more explanations on the content of the dplyr add-on package, you may want to watch the following video of my YouTube channel. I show further functions of the dplyr package in the video:

 

Please accept YouTube cookies to play this video. By accepting you will be accessing content from YouTube, a service provided by an external third party.

YouTube Content Consent Button Thumbnail

YouTube privacy policy

If you accept this notice, your choice will be saved and the page will refresh.

 

In addition, you could read some of the other articles of Statistics Globe. I have released several articles about the dplyr package and the manipulation of data already.

 

At this point you should have learned how to delete duplicated rows of data frames and tibbles with the dplyr package in R programming. In case you have any additional questions, let me know in the comments section.

 

Subscribe to my free statistics newsletter

Get regular updates on the latest tutorials, offers & news at Statistics Globe.
I hate spam & you may opt out anytime: Privacy Policy.


Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.

Menu
Top