# Mean by Group in R (2 Examples) | dplyr Package vs. Base R

In this tutorial you’ll learn how to **compute the mean by group** in the R programming language.

I’ll show **two different alternatives** including reproducible R codes.

Let’s dig into it!

## Example Data

For the following examples, I’m going to use the Iris Flower data set. Let’s load the data to R:

data(iris) # Load Iris data head(iris) # First rows of Iris |

data(iris) # Load Iris data head(iris) # First rows of Iris

**Table 1: The Iris Data Matrix.**

As you can see based on Table 1, the Iris Flower data contains four numeric columns as well as the grouping factor column Species

Next, I’ll show you how to calculate the average for each of these groups. Keep on reading!

## Example 1: Compute Mean by Group in R with aggregate Function

The first example shows how to calculate the mean per group with the aggregate function.

We can compute the mean for each species of the Iris Flower data by applying the aggregate function as follows:

aggregate(x = iris$Sepal.Length, # Specify data column by = list(iris$Species), # Specify group indicator FUN = mean) # Specify function (i.e. mean) # Group.1 x # setosa 5.006 # versicolor 5.936 # virginica 6.588 |

aggregate(x = iris$Sepal.Length, # Specify data column by = list(iris$Species), # Specify group indicator FUN = mean) # Specify function (i.e. mean) # Group.1 x # setosa 5.006 # versicolor 5.936 # virginica 6.588

The RStudio console output shows the mean by group: The setosa group has a mean of 5.006, the versicolor group has a mean of 5.936, and the virginica group has a mean of 6.588.

**Note:** By replacing the FUN argument of the aggregate function, we can also compute other metrics such as the median, the mode, the variance, or the standard deviation.

## Example 2: Compute Mean by Group with dplyr Package

It’s definitely a matter of taste, but many people prefer to use the dplyr package to compute descriptive statistics such as the mean. This example shows how to get the mean by group based on the dplyr environment.

Let’s install and load the dplyr package to R:

install.packages("dplyr") # Install dplyr package library("dplyr") # Load dplyr package |

install.packages("dplyr") # Install dplyr package library("dplyr") # Load dplyr package

Now, we can use all the functions of the dplyr package – in our case group_by and summarise_at:

iris %>% # Specify data frame group_by(Species) %>% # Specify group indicator summarise_at(vars(Sepal.Length), # Specify column list(name = mean)) # Specify function # A tibble: 3 x 2 # Species Sepal.Length # <fct> <dbl> # setosa 5.01 # versicolor 5.94 # virginica 6.59 |

iris %>% # Specify data frame group_by(Species) %>% # Specify group indicator summarise_at(vars(Sepal.Length), # Specify column list(name = mean)) # Specify function # A tibble: 3 x 2 # Species Sepal.Length # <fct> <dbl> # setosa 5.01 # versicolor 5.94 # virginica 6.59

The output of the previous R syntax is a tibble instead of a data.frame. However, the results are the same as in Example 1.

## Further Resources & Summary

This tutorial illustrated how to compute group means in the R programming language. In case you want to learn more about the theoretical research concept of the mean, I can recommend the following video of the mathantics YouTube channel:

**Please accept YouTube cookies to play this video.** By accepting you will be accessing content from YouTube, a service provided by an external third party.

If you accept this notice, your choice will be saved and the page will refresh.

Furthermore, you could also have a look at some of the related R tutorials that I have published on my website:

- aggregate Function in R
- Mean in R
- Mean Across Columns & Rows
- Weighted Mean in R
- Geometric Mean in R
- Harmonic Mean in R
- Mean of Data Frame Column
- Mean Imputation
- Median in R
- Mode in R
- R Functions List (+ Examples)
- The R Programming Language

I hope you found the tutorial helpful. However, if you have any questions or comments, don’t hesitate to let me know below.

**5**/

**5**(

**6**votes )

### Subscribe to my free statistics newsletter:

## 12 Comments. Leave new

After I used the dplyr option I’ve got this warning message

funs() is soft deprecated as of dplyr 0.8.0

please use list() instead

# Before:

funs(name = f(.))

# After:

list(name = ~ f(.))

So, maybe you could update the example code as:

iris%>%

group_by(Species)%>%

summarise_at(vars(Sepal.Length), list(name=mean))

Regards!

Hey Ruben,

Thank you for the hint, I’ve just changed the code accordingly ðŸ™‚

Regards,

Joachim

Hi! I’m fairly new to using R, after finding the mean values of two categorical groups I have how would I plot those means in a bar chart? Thank you

Hi Ben,

Good question! You may have a look at this tutorial: https://statisticsglobe.com/barplot-in-r

You may use the means of your two groups as height of the bar charts, i.e. store the two means in the vector “values” as shown in Example 1.

Regards,

Joachim

Thank you so much for the easy to follow instructions! I’ve been working on project since last week for my job and this cleared everything up for me in 5 minutes! Thank you!!!

Thank you for the comment, Julianna. I’m glad to hear that it helped! ðŸ™‚

Thanks so much for this! It was very helpful ðŸ™‚

Thanks Alicia, I’m happy to hear that ðŸ™‚

Thank you for these instructions – very helpful! I am wondering if the output for the dplyr method is rounded? If so, is there a way for the output to not be rounded like the aggregate function?

Thanks!

Hi Jan,

Thank you for the comment and the kind words!

Actually, this is a very good question! The dplyr package returns the data in tibble format (in contrast to the data.frame format returned by the aggregate function). Tibbles automatically display the data rounded to two digits. However, the actual values stored in the tibble are NOT rounded.

You can see that by converting the tibble back to data.frame format:

I also found this thread, which is discussing this topic in some more detail.

I hope that helps!

Joachim

Please help,

# Calculate the average_pop and median_pop columns

counties_selected %>%

group_by(region, state) %>%

summarize(total_pop = sum(population)) %>% ungroup()%>%

summarize(average_pop = mean(population), median_pop = median(population)

)

I got “object population not found”

Hey Ngoc,

Thanks for the comment!

I need a few more details. Is counties_selected a data frame? Could you tell me the column names of counties_selected?

Regards,

Joachim