R dplyr Message: `summarise()` has grouped output by ‘X’. You can override using the `.groups` argument.

 

In this article you’ll learn how to handle the dplyr message “`summarise()` has grouped output by ‘gr1’. You can override using the `.groups` argument.” in the R programming language.

The article will consist of two examples for the handling of the dplyr message “`summarise()` has grouped output by ‘gr1’. You can override using the `.groups` argument.”. To be more precise, the article consists of this:

Let’s dive right in…

 

Example Data & Add-On Packages

The first step is to create some example data:

data <- data.frame(gr1 = rep(LETTERS[1:4],    # Create example data frame
                             each = 3),
                   gr2 = letters[1:2],
                   values = 101:112)
data                                          # Print example data frame

 

table 1 data frame dplyr message summarise has grouped output r

 

Have a look at the previous table. It shows that our example data is constructed of twelve rows and three columns. The columns gr1 and gr2 are characters and the variable values has the integer class.

For the following tutorial, I also need to install and load the dplyr package of the tidyverse.

install.packages("dplyr")                     # Install & load dplyr
library("dplyr")

 

Example 1: Reproduce the Message – `summarise()` has grouped output by ‘gr1’. You can override using the `.groups` argument.

In this example, I’ll explain how to replicate the message “`summarise()` has grouped output by ‘X’. You can override using the `.groups` argument.”.

Let’s assume that we want to group our data using multiple columns (i.e. the group indicators gr1 and gr2). Then, we can use the group_by and summarise functions of the dplyr package as shown below:

data_group <- data %>%                        # Group data
  group_by(gr1, gr2) %>%
  dplyr::summarise(gr_sum = sum(values))
# `summarise()` has grouped output by 'gr1'. You can override using the `.groups` argument.

As you can see, the previous R code has returned the message “`summarise()` has grouped output by ‘X’. You can override using the `.groups` argument.”.

However, if we have a look at the resulting data, everything looks fine:

data_group                                    # Print grouped data
# # A tibble: 8 × 3
# # Groups:   gr1 [4]
#   gr1   gr2   gr_sum
#   <chr> <chr>  <int>
# 1 A     a        204
# 2 A     b        102
# 3 B     a        105
# 4 B     b        210
# 5 C     a        216
# 6 C     b        108
# 7 D     a        111
# 8 D     b        222

So, why has this message even occurred?

The reason for the message “`summarise()` has grouped output by ‘X’. You can override using the `.groups` argument.” is that the dplyr package drops the last group variable that was specified in the group_by function, in case we are using multiple columns to group our data before applying the summarise function.

This message helps to make the user aware that a grouping was performed. However, the message does not have an impact on the final result. In other words: the message “`summarise()` has grouped output by ‘X’. You can override using the `.groups` argument.” is just a friendly warning message that could usually be ignored.

Note: This message might suddenly occur in R code that ran without showing the message before. The reason for this is that the default settings of the dplyr package have changed in a recent release.

 

Example 2: Avoid the Message – `summarise()` has grouped output by ‘gr1’. You can override using the `.groups` argument.

Even though this message might be useful in some situations, it might be confusing in others.

In case you want to disable such dplyr messages in your code, you may change the global options for the summarise function as shown below:

options(dplyr.summarise.inform = FALSE)

If we now run our code once again, the message “`summarise()` has grouped output by ‘X’. You can override using the `.groups` argument.” is not shown anymore:

data_group <- data %>%                        # Group data
  group_by(gr1, gr2) %>%
  dplyr::summarise(gr_sum = sum(values))

 

Video, Further Resources & Summary

Would you like to learn more about the handling of the dplyr message “`summarise()` has grouped output by ‘gr1’. You can override using the `.groups` argument.”? Then you might have a look at the following video on my YouTube channel. I’m explaining the R code of this tutorial in the video:

 

 

In addition, you could have a look at the related tutorials on my website. A selection of articles is listed below:

 

This tutorial has demonstrated how to deal with the dplyr message “`summarise()` has grouped output by ‘gr1’. You can override using the `.groups` argument.” in the R programming language. In case you have further questions, tell me about it in the comments section below.

 

Subscribe to the Statistics Globe Newsletter

Get regular updates on the latest tutorials, offers & news at Statistics Globe.
I hate spam & you may opt out anytime: Privacy Policy.


10 Comments. Leave new

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.

Top