R dplyr Message: `summarise()` has grouped output by ‘X’. You can override using the `.groups` argument.
In this article you’ll learn how to handle the dplyr message “`summarise()` has grouped output by ‘gr1’. You can override using the `.groups` argument.” in the R programming language.
The article will consist of two examples for the handling of the dplyr message “`summarise()` has grouped output by ‘gr1’. You can override using the `.groups` argument.”. To be more precise, the article consists of this:
Let’s dive right in…
Example Data & Add-On Packages
The first step is to create some example data:
data <- data.frame(gr1 = rep(LETTERS[1:4], # Create example data frame each = 3), gr2 = letters[1:2], values = 101:112) data # Print example data frame
Have a look at the previous table. It shows that our example data is constructed of twelve rows and three columns. The columns gr1 and gr2 are characters and the variable values has the integer class.
For the following tutorial, I also need to install and load the dplyr package of the tidyverse.
install.packages("dplyr") # Install & load dplyr library("dplyr")
Example 1: Reproduce the Message – `summarise()` has grouped output by ‘gr1’. You can override using the `.groups` argument.
In this example, I’ll explain how to replicate the message “`summarise()` has grouped output by ‘X’. You can override using the `.groups` argument.”.
Let’s assume that we want to group our data using multiple columns (i.e. the group indicators gr1 and gr2). Then, we can use the group_by and summarise functions of the dplyr package as shown below:
data_group <- data %>% # Group data group_by(gr1, gr2) %>% dplyr::summarise(gr_sum = sum(values)) # `summarise()` has grouped output by 'gr1'. You can override using the `.groups` argument.
As you can see, the previous R code has returned the message “`summarise()` has grouped output by ‘X’. You can override using the `.groups` argument.”.
However, if we have a look at the resulting data, everything looks fine:
data_group # Print grouped data # # A tibble: 8 × 3 # # Groups: gr1 [4] # gr1 gr2 gr_sum # <chr> <chr> <int> # 1 A a 204 # 2 A b 102 # 3 B a 105 # 4 B b 210 # 5 C a 216 # 6 C b 108 # 7 D a 111 # 8 D b 222
So, why has this message even occurred?
The reason for the message “`summarise()` has grouped output by ‘X’. You can override using the `.groups` argument.” is that the dplyr package drops the last group variable that was specified in the group_by function, in case we are using multiple columns to group our data before applying the summarise function.
This message helps to make the user aware that a grouping was performed. However, the message does not have an impact on the final result. In other words: the message “`summarise()` has grouped output by ‘X’. You can override using the `.groups` argument.” is just a friendly warning message that could usually be ignored.
Note: This message might suddenly occur in R code that ran without showing the message before. The reason for this is that the default settings of the dplyr package have changed in a recent release.
Example 2: Avoid the Message – `summarise()` has grouped output by ‘gr1’. You can override using the `.groups` argument.
Even though this message might be useful in some situations, it might be confusing in others.
In case you want to disable such dplyr messages in your code, you may change the global options for the summarise function as shown below:
options(dplyr.summarise.inform = FALSE)
If we now run our code once again, the message “`summarise()` has grouped output by ‘X’. You can override using the `.groups` argument.” is not shown anymore:
data_group <- data %>% # Group data group_by(gr1, gr2) %>% dplyr::summarise(gr_sum = sum(values))
Video, Further Resources & Summary
Would you like to learn more about the handling of the dplyr message “`summarise()` has grouped output by ‘gr1’. You can override using the `.groups` argument.”? Then you might have a look at the following video on my YouTube channel. I’m explaining the R code of this tutorial in the video:
Please accept YouTube cookies to play this video. By accepting you will be accessing content from YouTube, a service provided by an external third party.
If you accept this notice, your choice will be saved and the page will refresh.
In addition, you could have a look at the related tutorials on my website. A selection of articles is listed below:
- group_by & summarize Functions don’t Work Properly
- Count Number of Rows by Group Using dplyr Package
- Select Row with Maximum or Minimum Value in Each Group
- Introduction to R Programming
This tutorial has demonstrated how to deal with the dplyr message “`summarise()` has grouped output by ‘gr1’. You can override using the `.groups` argument.” in the R programming language. In case you have further questions, tell me about it in the comments section below.
10 Comments. Leave new
So this warning doesn’t impact anything in result right?
Hi Syz,
This is correct.
Regards,
Joachim
Thank you, it works.
Hi Kot,
That’s great to hear, thanks for the kind comment.
Regards,
Joachim
hello, what would I do if I want my data frame to be successfully grouped by the two variables and not one.
Hello Omoneka,
As explained in the tutorial, it is just a friendly message. You can use the summarise function for multiple groups, as shown in the examples in the tutorial.
Regards,
Cansu
That review is incorrect. In my case the error message does in fact impact the results. Specifically it only groups on one of the columns instead of both(which serves as an index). No such error appears in data.table. For reference the issue occurs with dplyr 1.0.10.
Hello Alex,
It is interesting. Could you please share your code with us?
Regards,
Cansu
This may not have been available when this article was written but as of dplyr 1.1.2 you can add another argument to `summarize` to avoid this message: `.groups = ‘keep`
Hello James!
Your input is appreciated.
Regards,
Cansu