Select Row with Maximum or Minimum Value in Each Group in R (Example) | dplyr vs. data.table Packages

 

This page explains how to return the highest or lowest values within each group of a data frame in the R programming language.

Table of contents:

Here’s the step-by-step process:

 

Exemplifying Data

First, let’s create some example data in R:

data <- data.frame(x = 1:10,                    # Create example data
                   group = c(rep("A", 2),
                             rep("B", 3),
                             rep("C", 5)))
data                                            # Print example data
#     x group
# 1   1     A
# 2   2     A
# 3   3     B
# 4   4     B
# 5   5     B
# 6   6     C
# 7   7     C
# 8   8     C
# 9   9     C
# 10 10     C

Our example data is a data frame with ten rows and two columns. The variable x is numeric and contains values ranging from 1 to 10. The variable group is our grouping indicator and contains three different group values (i.e. A, B, and C).

Now, let’s find the maximum and minimum values of each group!

 

Example 1: Max in Group with dplyr Package

In Example 1, I’m using the dplyr package to select the rows with the maximum value within each group. First, we need to install and load the package to RStudio:

install.packages("dplyr")                       # Install dplyr package
library("dplyr")                                # Load dplyr package

Now, we can use the group_by and the top_n functions to find the highest and lowest numeric values of each group:

data %>% group_by(group) %>% top_n(1, x)        # Apply dplyr functions
# # A tibble: 3 x 2
# # Groups:   group [3]
#       x group
#   <int> <fct>
# 1     2 A    
# 2     5 B    
# 3    10 C

The RStudio console is showing the result of the previous R syntax: The maxima of A, B, and C are 2, 5, and 10, respectively.

 

Example 2: Max / Min in Group with data.table Package

Example 2 shows how to return max and min values of each group using functions of the data.table package. Let’s install and load the package:

install.packages("data.table")                  # Install data.table package
library("data.table")                           # Load data.table package

Now, we can use the setDT and which.max functions to find the maximum number of our groups. Obviously, the values are the same as in Example 1:

setDT(data)[ , .SD[which.max(x)], by = group]   # Apply data.table functions
#    group  x
# 1:     A  2
# 2:     B  5
# 3:     C 10

If we want to find the minimum value within each group, we can simply replace the which.max by the which.min function:

setDT(data)[ , .SD[which.min(x)], by = group]   # Min of groups
#    group x
# 1:     A 1
# 2:     B 3
# 3:     C 6

The lowest values are 1, 3, and 6, respectively.

 

Video, Further Resources & Summary

If you need more information on the R programming syntax of this article, you might watch the following video of my YouTube channel. I illustrate the R code of this page in the video.

 

The YouTube video will be added soon.

 

Furthermore, you might read some of the related tutorials on my homepage. I have published several posts already.

 

In summary: This page showed how to select the top or bottom row of each group in R. Tell me about it in the comments, if you have any additional questions.

 



Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.

Menu
Top