distinct R Function of dplyr Package (Example)
In this post you’ll learn how to retain only unique rows of a data set with the distinct function of the dplyr package in R.
Table of contents:
- Creation of Example Data
- Example: Remove Duplicate Rows with distinct Function
- Video & Further Resources
Sound good? Here’s how to do it.
Creation of Example Data
If we want to apply the distinct function in R, we first need to install and load the dplyr add-on package to RStudio:
install.packages("dplyr") # Install and load dplyr library("dplyr")
Furthermore, we need to create some example data:
data <- data.frame(x1 = c(1:5, 1), # Create example data x2 = c(letters[1:5], "a")) data # Print data to RStudio console # x1 x2 # 1 1 a # 2 2 b # 3 3 c # 4 4 d # 5 5 e # 6 1 a
Our example data is a data.frame with six rows and two columns. The first and sixth row are identical.
Note that we could also use a data set in tibble format. However, in the present R tutorial we’ll stick to a data frame.
Example: Remove Duplicate Rows with distinct Function
We can remove duplicate rows from our example data with the distinct function as shown in the following:
distinct(data) # Remove duplicate rows # x1 x2 # 1 1 a # 2 2 b # 3 3 c # 4 4 d # 5 5 e
You can see the output of the distinct function in the RStudio console: The same data frame as before, but this time without the duplicate row six.
Video & Further Resources
If you need more explanations on the content of the dplyr add-on package, you may want to watch the following video of my YouTube channel. I show further functions of the dplyr package in the video:
Please accept YouTube cookies to play this video. By accepting you will be accessing content from YouTube, a service provided by an external third party.
If you accept this notice, your choice will be saved and the page will refresh.
In addition, you could read some of the other articles of Statistics Globe. I have released several articles about the dplyr package and the manipulation of data already.
- How to Remove Duplicate Rows with Base R
- Introduction to the dplyr Package in R
- R Functions List (+ Examples)
- The R Programming Language
At this point you should have learned how to delete duplicated rows of data frames and tibbles with the dplyr package in R programming. In case you have any additional questions, let me know in the comments section.
Statistics Globe Newsletter