How to Compute Summary Statistics by Group in R (3 Examples)

This page shows how to calculate descriptive statistics by group in R.

The article contains the following topics:

1) Construction of Example Data

2) Example 1: Descriptive Summary Statistics by Group Using tapply Function

3) Example 2: Descriptive Summary Statistics by Group Using dplyr Package

4) Example 3: Descriptive Summary Statistics by Group Using purrr Package

5) Video, Further Resources & Summary

If you want to know more about these topics, keep reading!

Construction of Example Data

First, we’ll need to create some exemplifying data:

set.seed(549298)                       # Create example data
data <- data.frame(x = rnorm(500, 1, 3),
                   group = LETTERS[1:5])
head(data)                             # Print head of example data
#             x group
# 1  0.38324291     A
# 2 -0.06604541     B
# 3 -1.98454741     C
# 4  3.44815045     D
# 5  4.11107771     E
# 6  4.07278357     A

Have a look at the previous output of the RStudio console. It shows that our exemplifying data has two columns. The variable x contains randomly distributed numeric values and the variable group contains five different grouping labels.

We could return descriptive statistics of our numeric data column x using the summary function as shown below:

summary(data$x)                        # Summary of entire data
#    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
# -7.765  -1.045   1.115   1.117   3.151  10.216

However, this would only return the summary statistics of the whole data. In the following examples I’ll therefore show different ways how to get summary statistics for each group of our data.

Keep on reading!

Example 1: Descriptive Summary Statistics by Group Using tapply Function

In this example, I’ll show how to use the basic installation of the R programming language to return descriptive summary statistics by group. More precisely, I’m using the tapply function:

tapply(data$x, data$group, summary)    # Summary by group using tapply
# $A
#   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
# -7.236  -1.161   1.530   1.339   3.834   8.747 
# 
# $B
#   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
# -7.148  -1.002   0.944   1.037   3.004  10.216 
# 
# $C
#   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
# -6.636  -1.282   1.340   1.030   2.956   8.667 
# 
# $D
#    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
# -7.7652 -1.2207  0.7849  0.7280  2.3334  8.3459 
# 
# $E
#    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
# -5.4817 -0.3648  1.5931  1.4498  3.3325  7.6403

The output of the previous R syntax is a list containing one list element for each group. Each of these list elements contains basic summary statistics for the corresponding group.

Example 2: Descriptive Summary Statistics by Group Using dplyr Package

Another alternative for the computation of descriptive summary statistics is provided by the dplyr package.

First, we have to install and load the dplyr package:

install.packages("dplyr")              # Install dplyr package
library("dplyr")                       # Load dplyr package

Now, we can apply the group_by and summarize functions to calculate summary statistics by group:

data %>%                               # Summary by group using dplyr
  group_by(group) %>% 
  summarize(min = min(x),
            q1 = quantile(x, 0.25),
            median = median(x),
            mean = mean(x),
            q3 = quantile(x, 0.75),
            max = max(x))
             # # A tibble: 5 x 7
#   group   min     q1 median  mean    q3   max
#   <fct> <dbl>  <dbl>  <dbl> <dbl> <dbl> <dbl>
# 1 A     -7.24 -1.16   1.53  1.34   3.83  8.75
# 2 B     -7.15 -1.00   0.944 1.04   3.00 10.2 
# 3 C     -6.64 -1.28   1.34  1.03   2.96  8.67
# 4 D     -7.77 -1.22   0.785 0.728  2.33  8.35
# 5 E     -5.48 -0.365  1.59  1.45   3.33  7.64

The output of the previous R code is a tibble that contains basically the same values as the list created in Example 1. Whether you prefer to use the basic installation or the dplyr package is a matter of taste.

Example 3: Descriptive Summary Statistics by Group Using purrr Package

In Example 3, I’ll illustrate another alternative for the calculation of summary statistics by group in R.

This example relies on the functions of the purrr package (another add-on package provided by the tidyverse).

We first have to install and load the purrr package:

install.packages("purrr")              # Install & load purrr
library("purrr")

Now, we can use the following R code to produce another kind of output showing descriptive stats by group:

data %>%                               # Summary by group using purrr
  split(.$group) %>%
  map(summary)
# $A
#       x          group  
# Min.   :-7.236   A:100  
# 1st Qu.:-1.161   B:  0  
# Median : 1.530   C:  0  
# Mean   : 1.339   D:  0  
# 3rd Qu.: 3.834   E:  0  
# Max.   : 8.747          
# 
# $B
#       x          group  
# Min.   :-7.148   A:  0  
# 1st Qu.:-1.002   B:100  
# Median : 0.944   C:  0  
# Mean   : 1.037   D:  0  
# 3rd Qu.: 3.004   E:  0  
# Max.   :10.216          
# 
# $C
#       x          group  
# Min.   :-6.636   A:  0  
# 1st Qu.:-1.282   B:  0  
# Median : 1.340   C:100  
# Mean   : 1.030   D:  0  
# 3rd Qu.: 2.956   E:  0  
# Max.   : 8.667          
# 
# $D
#       x           group  
# Min.   :-7.7652   A:  0  
# 1st Qu.:-1.2207   B:  0  
# Median : 0.7849   C:  0  
# Mean   : 0.7280   D:100  
# 3rd Qu.: 2.3334   E:  0  
# Max.   : 8.3459          
# 
# $E
#       x           group  
# Min.   :-5.4817   A:  0  
# 1st Qu.:-0.3648   B:  0  
# Median : 1.5931   C:  0  
# Mean   : 1.4498   D:  0  
# 3rd Qu.: 3.3325   E:100  
# Max.   : 7.6403

Again, the values are basically the same.

Video, Further Resources & Summary

Have a look at the following video of my YouTube channel. I’m explaining the topics of this article in the video:

In addition, I can recommend having a look at the other tutorials on this homepage. A selection of articles can be found below.

R Programming Tutorials

In this article, I showed how to get a summary statistics table for each group of a data frame in the R programming language. Don’t hesitate to let me know in the comments section, if you have further questions and/or comments.

4 Comments. Leave new

Giuliana Spadaro
July 6, 2022 9:01 am

Thanks for the tutorial! Just a small note: in the summary by group using dplyr, the function should be ‘summarise’ (with S) instead of ‘summarize’ (with Z).

Reply
- Joachim
  July 8, 2022 7:05 pm
  
  Hey Giuliana,
  
  Thank you for the kind comment! summarise and summarize are treated the same, though. Have a look here for more details.
  
  Regards,
  Joachim
  
  Reply
andre
September 7, 2022 4:05 pm

thanks again

Reply
- Joachim
  September 8, 2022 6:34 am
  
  You are very welcome Andre! 🙂
  
  Reply