Apply unique Function to Multiple Columns in R (2 Examples)
On this page you’ll learn how to retain only data frame rows that are unique in certain variables in the R programming language.
The article looks as follows:
Let’s just jump right in:
Creation of Example Data
As a first step, we have to construct some example data:
data <- data.frame(x1 = c(1, 1, 1, 2, 2, 2), # Creating example data x2 = c("a", "a", "b", "b", "b", "c"), x3 = LETTERS[1:6]) data # Printing example data # x1 x2 x3 # 1 1 a A # 2 1 a B # 3 1 b C # 4 2 b D # 5 2 b E # 6 2 c F
The previous output of the RStudio console shows that our example data has six rows and three columns. The variables x1 and x2 are duplicated in some rows.
Example 1: Select Unique Data Frame Rows Using unique() Function
In this example, I’ll show how to apply the unique function based on multiple variables of our example data frame.
Have a look at the following R code and its output:
data_unique <- unique(data[ , c("x1", "x2")]) # Apply unique data_unique # Print new data frame # x1 x2 # 1 1 a # 3 1 b # 4 2 b # 6 2 c
The previous R syntax created a new data frame that only consists of unique rows in the variables x1 and x2.
It is important to note that the previous R code also deleted the column x3. In the next example, I’ll explain how to keep the variable x3 in our output data matrix…
Example 2: Select Unique Data Frame Rows Using duplicated() Function
In Example 2, I’ll illustrate how to use the duplicated function to create a data frame subset that is unique in specific columns. Consider the following R code:
data_duplicated <- data[!duplicated(data[ , c("x1", "x2")]), ] # Apply duplicated data_duplicated # Print new data frame # x1 x2 x3 # 1 1 a A # 3 1 b C # 4 2 b D # 6 2 c F
As you can see, the retained rows are the same as in Example 1. However, this time we also kept the variable x3 that was not used for the identification of unique rows.
Video, Further Resources & Summary
I have recently released a video on my YouTube channel, which illustrates the R programming code of the present tutorial. You can find the video below.
Please accept YouTube cookies to play this video. By accepting you will be accessing content from YouTube, a service provided by an external third party.
If you accept this notice, your choice will be saved and the page will refresh.
In addition to the video, I can recommend to read the related tutorials on my website. I have released several tutorials already:
- Unique Rows of Data Frame Based On Selected Columns
- Drop Multiple Columns from Data Frame Using dplyr Package
- List of R Commands (+ Examples)
- R Programming Language
In this R tutorial you learned how to keep only data frame rows that are not duplicated in particular columns. Let me know in the comments section below, in case you have additional questions. Furthermore, please subscribe to my email newsletter to receive regular updates on new posts.
Statistics Globe Newsletter