# Select Top N Highest Values by Group in R (3 Examples)

This tutorial explains how to extract the N highest values within each group of a data frame column in the R programming language.

Letâ€™s dive into it!

## Creation of Exemplifying Data

Weâ€™ll use the following data as basement for this R programming language tutorial:

```data <- data.frame(group = rep(letters[1:3], each = 5),    # Create example data
value = 1:15)
data                                                       # Print example data```

As you can see based on Table 1, our example data is a data frame containing 15 rows and two columns. The variable group contains three different group indicators and the variable value contains the corresponding values.

## Example 1: Extract Top N Highest Values by Group Using Base R

In Example 1, Iâ€™ll show how to return the N highest data points of each group using the basic installation of the R programming language.

For this, we first have to sort our data based on the value column in descending order:

`data_new1 <- data[order(data\$value, decreasing = TRUE), ]  # Order data descending`

As next step, we have to apply the Reduce, rbind, and head functions as shown below:

```data_new1 <- Reduce(rbind,                                 # Top N highest values by group
by(data_new1,
data_new1["group"],
n = 3))```

The previous R code has created a new data frame object called data_new1. Letâ€™s have a look at this data object:

`data_new1                                                  # Print updated data`

The output of the previous R programming syntax is shown in Table 2 â€“ We have created a data frame subset containing only the three cases with the highest values of each group.

## Example 2: Extract Top N Highest Values by Group Using dplyr Package

This example shows how to keep only the N observations with the highest values by group using the functions of the dplyr package.

First, we need to install and load the dplyr add-on package:

```install.packages("dplyr")                                  # Install dplyr package
library("dplyr")                                           # Load dplyr```

Next, we can use the arrange, desc, group_by, and slice functions to return a tibble containing only the three highest values in each group:

```data_new2 <- data %>%                                      # Top N highest values by group
arrange(desc(value)) %>%
group_by(group) %>%
slice(1:3)
data_new2                                                  # Print updated data
# # A tibble: 9 x 2
# # Groups:   group [3]
#   group value
#   <chr> <int>
# 1 a         5
# 2 a         4
# 3 a         3
# 4 b        10
# 5 b         9
# 6 b         8
# 7 c        15
# 8 c        14
# 9 c        13```

## Example 3: Extract Top N Highest Values by Group Using data.table Package

In this example, Iâ€™ll show how to use the data.table package to retain only the highest N values of each data frame group.

First, we need to install and load the data.table package to RStudio:

```install.packages("data.table")                             # Install data.table package
library("data.table")                                      # Load data.table package```

Now, we can apply the following R syntax to create a new data.table:

```data_new3 <- data[order(data\$value, decreasing = TRUE), ]  # Top N highest values by group
data_new3 <- data.table(data_new3, key = "group")
data_new3 <- data_new3[ , head(.SD, 3), by = group]
data_new3                                                  # Print updated data```

After running the previous syntax the data.table containing only the three highest values in each group revealed in Table 3 has been created.

## Video & Further Resources

Have a look at the following video of my YouTube channel. In the video, I illustrate the R programming code of this article in RStudio.

In addition, you could read some of the related tutorials of my website:

At this point you should know how to return the highest N values in a variable by group in the R programming language. Tell me about it in the comments below, in case you have any additional comments or questions. Furthermore, donâ€™t forget to subscribe to my email newsletter in order to get updates on the newest posts.

Subscribe to the Statistics Globe Newsletter

Get regular updates on the latest tutorials, offers & news at Statistics Globe.
I hate spam & you may opt out anytime: Privacy Policy.

#### 8 Comments.Leave new

• Hello,

Thank you so much for your article. It is really helpful for me. Now I have a problem when applying your code because my N value changes depending on the group. For example, in group a, I need 2 maximum values. In group b, I will only need 1 maximum value. What should I do in R once my value of n changes like this?

I hope to receive your support! Thank you so much in advance!

• Hi Norido,

Thank you so much for the kind words, glad you find my tutorial helpful!

I apologize for the delayed reply. I was on a long holiday, so unfortunately I wasn’t able to get back to you earlier. Do you still need help with your syntax?

Regards,
Joachim

• Thank you very much for your tutorial, I have a question I want to find the highest and smallest value of a cutoff value.
And I have not been able to do it yet, do you have any idea how to execute it.
Thank you very much for your help.

• Hello Constanza,

I am not sure if I get well. What do you mean by the highest and lowest value of a cutoff? Could you please give an example?

Best,
Cansu

• Sorry, of course.
I have a threshold of 0.19 I need the largest value that comes after and the smallest value before this threshold.

• Hello Constanz,

Check that out.

```x <- c(2, 4, 8, 6, 10, 14, 16, 12, 18, 20)

threshold <- 10

smallest_before_threshold <- max(x[x < threshold])
smallest_before_threshold
# 8

largest_after_threshold <- min(x[x > threshold])
largest_after_threshold
# 12```

Best,
Cansu