Select Top N Highest Values by Group in R (3 Examples)

 

This tutorial explains how to extract the N highest values within each group of a data frame column in the R programming language.

Table of contents:

Let’s dive into it!

 

Creation of Exemplifying Data

We’ll use the following data as basement for this R programming language tutorial:

data <- data.frame(group = rep(letters[1:3], each = 5),    # Create example data
                   value = 1:15)
data                                                       # Print example data

 

table 1 data frame select top n highest values group r

 

As you can see based on Table 1, our example data is a data frame containing 15 rows and two columns. The variable group contains three different group indicators and the variable value contains the corresponding values.

 

Example 1: Extract Top N Highest Values by Group Using Base R

In Example 1, I’ll show how to return the N highest data points of each group using the basic installation of the R programming language.

For this, we first have to sort our data based on the value column in descending order:

data_new1 <- data[order(data$value, decreasing = TRUE), ]  # Order data descending

As next step, we have to apply the Reduce, rbind, and head functions as shown below:

data_new1 <- Reduce(rbind,                                 # Top N highest values by group
                    by(data_new1,
                       data_new1["group"],
                       head,
                       n = 3))

The previous R code has created a new data frame object called data_new1. Let’s have a look at this data object:

data_new1                                                  # Print updated data

 

table 2 data frame select top n highest values group r

 

The output of the previous R programming syntax is shown in Table 2 – We have created a data frame subset containing only the three cases with the highest values of each group.

 

Example 2: Extract Top N Highest Values by Group Using dplyr Package

This example shows how to keep only the N observations with the highest values by group using the functions of the dplyr package.

First, we need to install and load the dplyr add-on package:

install.packages("dplyr")                                  # Install dplyr package
library("dplyr")                                           # Load dplyr

Next, we can use the arrange, desc, group_by, and slice functions to return a tibble containing only the three highest values in each group:

data_new2 <- data %>%                                      # Top N highest values by group
  arrange(desc(value)) %>% 
  group_by(group) %>%
  slice(1:3)
data_new2                                                  # Print updated data
# # A tibble: 9 x 2
# # Groups:   group [3]
#   group value
#   <chr> <int>
# 1 a         5
# 2 a         4
# 3 a         3
# 4 b        10
# 5 b         9
# 6 b         8
# 7 c        15
# 8 c        14
# 9 c        13

 

Example 3: Extract Top N Highest Values by Group Using data.table Package

In this example, I’ll show how to use the data.table package to retain only the highest N values of each data frame group.

First, we need to install and load the data.table package to RStudio:

install.packages("data.table")                             # Install data.table package
library("data.table")                                      # Load data.table package

Now, we can apply the following R syntax to create a new data.table:

data_new3 <- data[order(data$value, decreasing = TRUE), ]  # Top N highest values by group
data_new3 <- data.table(data_new3, key = "group")
data_new3 <- data_new3[ , head(.SD, 3), by = group]
data_new3                                                  # Print updated data

 

table 3 data table select top n highest values group r

 

After running the previous syntax the data.table containing only the three highest values in each group revealed in Table 3 has been created.

 

Video & Further Resources

Have a look at the following video of my YouTube channel. In the video, I illustrate the R programming code of this article in RStudio.

 

 

In addition, you could read some of the related tutorials of my website:

 

At this point you should know how to return the highest N values in a variable by group in the R programming language. Tell me about it in the comments below, in case you have any additional comments or questions. Furthermore, don’t forget to subscribe to my email newsletter in order to get updates on the newest posts.

 

Subscribe to the Statistics Globe Newsletter

Get regular updates on the latest tutorials, offers & news at Statistics Globe.
I hate spam & you may opt out anytime: Privacy Policy.


8 Comments. Leave new

  • Hello,

    Thank you so much for your article. It is really helpful for me. Now I have a problem when applying your code because my N value changes depending on the group. For example, in group a, I need 2 maximum values. In group b, I will only need 1 maximum value. What should I do in R once my value of n changes like this?

    I hope to receive your support! Thank you so much in advance!

    Reply
    • Hi Norido,

      Thank you so much for the kind words, glad you find my tutorial helpful!

      I apologize for the delayed reply. I was on a long holiday, so unfortunately I wasn’t able to get back to you earlier. Do you still need help with your syntax?

      Regards,
      Joachim

      Reply
  • Thank you very much for your tutorial, I have a question I want to find the highest and smallest value of a cutoff value.
    And I have not been able to do it yet, do you have any idea how to execute it.
    Thank you very much for your help.

    Reply
  • Sorry, of course.
    I have a threshold of 0.19 I need the largest value that comes after and the smallest value before this threshold.

    Reply
    • Hello Constanz,

      Check that out.

      x <- c(2, 4, 8, 6, 10, 14, 16, 12, 18, 20)
       
      threshold <- 10
       
      smallest_before_threshold <- max(x[x < threshold])
      smallest_before_threshold
      # 8
       
      largest_after_threshold <- min(x[x > threshold])
      largest_after_threshold
      # 12

      Best,
      Cansu

      Reply
  • Thak you for your help! 🙂

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.

Top