R dplyr Message: `summarise()` has grouped output by ‘X’. You can override using the `.groups` argument.

In this article you’ll learn how to handle the dplyr message “`summarise()` has grouped output by ‘gr1’. You can override using the `.groups` argument.” in the R programming language.

The article will consist of two examples for the handling of the dplyr message “`summarise()` has grouped output by ‘gr1’. You can override using the `.groups` argument.”. To be more precise, the article consists of this:

1) Example Data & Add-On Packages

2) Example 1: Reproduce the Message – `summarise()` has grouped output by ‘gr1’. You can override using the `.groups` argument.

3) Example 2: Avoid the Message – `summarise()` has grouped output by ‘gr1’. You can override using the `.groups` argument.

4) Video, Further Resources & Summary

5) Subscribe to the Statistics Globe Newsletter

6) Thank you!

Let’s dive right in…

Example Data & Add-On Packages

The first step is to create some example data:

data <- data.frame(gr1 = rep(LETTERS[1:4],    # Create example data frame
                             each = 3),
                   gr2 = letters[1:2],
                   values = 101:112)
data                                          # Print example data frame

table 1 data frame dplyr message summarise has grouped output r

Have a look at the previous table. It shows that our example data is constructed of twelve rows and three columns. The columns gr1 and gr2 are characters and the variable values has the integer class.

For the following tutorial, I also need to install and load the dplyr package of the tidyverse.

install.packages("dplyr")                     # Install & load dplyr
library("dplyr")

Example 1: Reproduce the Message – `summarise()` has grouped output by ‘gr1’. You can override using the `.groups` argument.

In this example, I’ll explain how to replicate the message “`summarise()` has grouped output by ‘X’. You can override using the `.groups` argument.”.

Let’s assume that we want to group our data using multiple columns (i.e. the group indicators gr1 and gr2). Then, we can use the group_by and summarise functions of the dplyr package as shown below:

data_group <- data %>%                        # Group data
  group_by(gr1, gr2) %>%
  dplyr::summarise(gr_sum = sum(values))
# `summarise()` has grouped output by 'gr1'. You can override using the `.groups` argument.

As you can see, the previous R code has returned the message “`summarise()` has grouped output by ‘X’. You can override using the `.groups` argument.”.

However, if we have a look at the resulting data, everything looks fine:

data_group                                    # Print grouped data
# # A tibble: 8 × 3
# # Groups:   gr1 [4]
#   gr1   gr2   gr_sum
#   <chr> <chr>  <int>
# 1 A     a        204
# 2 A     b        102
# 3 B     a        105
# 4 B     b        210
# 5 C     a        216
# 6 C     b        108
# 7 D     a        111
# 8 D     b        222

So, why has this message even occurred?

The reason for the message “`summarise()` has grouped output by ‘X’. You can override using the `.groups` argument.” is that the dplyr package drops the last group variable that was specified in the group_by function, in case we are using multiple columns to group our data before applying the summarise function.

This message helps to make the user aware that a grouping was performed. However, the message does not have an impact on the final result. In other words: the message “`summarise()` has grouped output by ‘X’. You can override using the `.groups` argument.” is just a friendly warning message that could usually be ignored.

Note: This message might suddenly occur in R code that ran without showing the message before. The reason for this is that the default settings of the dplyr package have changed in a recent release.

Example 2: Avoid the Message – `summarise()` has grouped output by ‘gr1’. You can override using the `.groups` argument.

Even though this message might be useful in some situations, it might be confusing in others.

In case you want to disable such dplyr messages in your code, you may change the global options for the summarise function as shown below:

options(dplyr.summarise.inform = FALSE)

If we now run our code once again, the message “`summarise()` has grouped output by ‘X’. You can override using the `.groups` argument.” is not shown anymore:

data_group <- data %>%                        # Group data
  group_by(gr1, gr2) %>%
  dplyr::summarise(gr_sum = sum(values))

Video, Further Resources & Summary

Would you like to learn more about the handling of the dplyr message “`summarise()` has grouped output by ‘gr1’. You can override using the `.groups` argument.”? Then you might have a look at the following video on my YouTube channel. I’m explaining the R code of this tutorial in the video:

In addition, you could have a look at the related tutorials on my website. A selection of articles is listed below:

This tutorial has demonstrated how to deal with the dplyr message “`summarise()` has grouped output by ‘gr1’. You can override using the `.groups` argument.” in the R programming language. In case you have further questions, tell me about it in the comments section below.

10 Comments. Leave new

syz
September 14, 2022 8:15 pm

So this warning doesn’t impact anything in result right?

Reply
- Joachim
  September 19, 2022 11:17 am
  
  Hi Syz,
  
  This is correct.
  
  Regards,
  Joachim
  
  Reply
Kot
November 15, 2022 2:46 am

Thank you, it works.

Reply
- Joachim
  November 15, 2022 8:25 am
  
  Hi Kot,
  
  That’s great to hear, thanks for the kind comment.
  
  Regards,
  Joachim
  
  Reply
Omoneka Adams
January 30, 2023 11:05 pm

hello, what would I do if I want my data frame to be successfully grouped by the two variables and not one.

Reply
- Cansu (Statistics Globe)
  January 31, 2023 9:22 am
  
  Hello Omoneka,
  
  As explained in the tutorial, it is just a friendly message. You can use the summarise function for multiple groups, as shown in the examples in the tutorial.
  
  Regards,
  Cansu
  
  Reply
Alex
February 2, 2023 3:12 pm

That review is incorrect. In my case the error message does in fact impact the results. Specifically it only groups on one of the columns instead of both(which serves as an index). No such error appears in data.table. For reference the issue occurs with dplyr 1.0.10.

Reply
- Cansu (Statistics Globe)
  February 6, 2023 10:04 am
  
  Hello Alex,
  
  It is interesting. Could you please share your code with us?
  
  Regards,
  Cansu
  
  Reply
James
May 12, 2023 10:43 pm

This may not have been available when this article was written but as of dplyr 1.1.2 you can add another argument to `summarize` to avoid this message: `.groups = ‘keep`

Reply
- Cansu (Statistics Globe)
  May 15, 2023 7:54 am
  
  Hello James!
  
  Your input is appreciated.
  
  Regards,
  Cansu
  
  Reply