Select Row with Maximum or Minimum Value in Each Group in R (Example) | dplyr vs. data.table Packages
This page explains how to return the highest or lowest values within each group of a data frame in the R programming language.
Table of contents:
- Exemplifying Data
- Example 1: Max in Group with dplyr Package
- Example 2: Max / Min in Group with data.table Package
- Video, Further Resources & Summary
Here’s the step-by-step process:
First, let’s create some example data in R:
data <- data.frame(x = 1:10, # Create example data group = c(rep("A", 2), rep("B", 3), rep("C", 5))) data # Print example data # x group # 1 1 A # 2 2 A # 3 3 B # 4 4 B # 5 5 B # 6 6 C # 7 7 C # 8 8 C # 9 9 C # 10 10 C
Our example data is a data frame with ten rows and two columns. The variable x is numeric and contains values ranging from 1 to 10. The variable group is our grouping indicator and contains three different group values (i.e. A, B, and C).
Now, let’s find the maximum and minimum values of each group!
Example 1: Max in Group with dplyr Package
In Example 1, I’m using the dplyr package to select the rows with the maximum value within each group. First, we need to install and load the package to RStudio:
install.packages("dplyr") # Install dplyr package library("dplyr") # Load dplyr package
Now, we can use the group_by and the top_n functions to find the highest and lowest numeric values of each group:
data %>% group_by(group) %>% top_n(1, x) # Apply dplyr functions # # A tibble: 3 x 2 # # Groups: group  # x group # <int> <fct> # 1 2 A # 2 5 B # 3 10 C
The RStudio console is showing the result of the previous R syntax: The maxima of A, B, and C are 2, 5, and 10, respectively.
UPDATE: Note that top_n has been superseded in favor of slice_min()/slice_max(). The following R code should therefore be preferred:
data %>% group_by(group) %>% slice_max(n = 1, x)
Example 2: Max / Min in Group with data.table Package
Example 2 shows how to return max and min values of each group using functions of the data.table package. Let’s install and load the package:
install.packages("data.table") # Install data.table package library("data.table") # Load data.table package
Now, we can use the setDT and which.max functions to find the maximum number of our groups. Obviously, the values are the same as in Example 1:
setDT(data)[ , .SD[which.max(x)], by = group] # Apply data.table functions # group x # 1: A 2 # 2: B 5 # 3: C 10
If we want to find the minimum value within each group, we can simply replace the which.max by the which.min function:
setDT(data)[ , .SD[which.min(x)], by = group] # Min of groups # group x # 1: A 1 # 2: B 3 # 3: C 6
The lowest values are 1, 3, and 6, respectively.
Video, Further Resources & Summary
If you need more information on the R programming syntax of this article, you might watch the following video of my YouTube channel. I illustrate the R code of this page in the video.
Please accept YouTube cookies to play this video. By accepting you will be accessing content from YouTube, a service provided by an external third party.
If you accept this notice, your choice will be saved and the page will refresh.
Furthermore, you might read some of the related tutorials on my homepage. I have published several posts already.
In summary: This page showed how to select the top or bottom row of each group in R. Tell me about it in the comments, if you have any additional questions.
Statistics Globe Newsletter