# How to Compute Summary Statistics by Group in R (3 Examples)

This page shows how to **calculate descriptive statistics by group** in R.

The article contains the following topics:

If you want to know more about these topics, keep reading!

## Construction of Example Data

First, we’ll need to create some exemplifying data:

set.seed(549298) # Create example data data <- data.frame(x = rnorm(500, 1, 3), group = LETTERS[1:5]) head(data) # Print head of example data # x group # 1 0.38324291 A # 2 -0.06604541 B # 3 -1.98454741 C # 4 3.44815045 D # 5 4.11107771 E # 6 4.07278357 A |

set.seed(549298) # Create example data data <- data.frame(x = rnorm(500, 1, 3), group = LETTERS[1:5]) head(data) # Print head of example data # x group # 1 0.38324291 A # 2 -0.06604541 B # 3 -1.98454741 C # 4 3.44815045 D # 5 4.11107771 E # 6 4.07278357 A

Have a look at the previous output of the RStudio console. It shows that our exemplifying data has two columns. The variable x contains randomly distributed numeric values and the variable group contains five different grouping labels.

We could return descriptive statistics of our numeric data column x using the summary function as shown below:

summary(data$x) # Summary of entire data # Min. 1st Qu. Median Mean 3rd Qu. Max. # -7.765 -1.045 1.115 1.117 3.151 10.216 |

summary(data$x) # Summary of entire data # Min. 1st Qu. Median Mean 3rd Qu. Max. # -7.765 -1.045 1.115 1.117 3.151 10.216

However, this would only return the summary statistics of the whole data. In the following examples I’ll therefore show different ways how to get summary statistics for each group of our data.

Keep on reading!

## Example 1: Descriptive Summary Statistics by Group Using tapply Function

In this example, I’ll show how to use the basic installation of the R programming language to return descriptive summary statistics by group. More precisely, I’m using the tapply function:

tapply(data$x, data$group, summary) # Summary by group using tapply # $A # Min. 1st Qu. Median Mean 3rd Qu. Max. # -7.236 -1.161 1.530 1.339 3.834 8.747 # # $B # Min. 1st Qu. Median Mean 3rd Qu. Max. # -7.148 -1.002 0.944 1.037 3.004 10.216 # # $C # Min. 1st Qu. Median Mean 3rd Qu. Max. # -6.636 -1.282 1.340 1.030 2.956 8.667 # # $D # Min. 1st Qu. Median Mean 3rd Qu. Max. # -7.7652 -1.2207 0.7849 0.7280 2.3334 8.3459 # # $E # Min. 1st Qu. Median Mean 3rd Qu. Max. # -5.4817 -0.3648 1.5931 1.4498 3.3325 7.6403 |

tapply(data$x, data$group, summary) # Summary by group using tapply # $A # Min. 1st Qu. Median Mean 3rd Qu. Max. # -7.236 -1.161 1.530 1.339 3.834 8.747 # # $B # Min. 1st Qu. Median Mean 3rd Qu. Max. # -7.148 -1.002 0.944 1.037 3.004 10.216 # # $C # Min. 1st Qu. Median Mean 3rd Qu. Max. # -6.636 -1.282 1.340 1.030 2.956 8.667 # # $D # Min. 1st Qu. Median Mean 3rd Qu. Max. # -7.7652 -1.2207 0.7849 0.7280 2.3334 8.3459 # # $E # Min. 1st Qu. Median Mean 3rd Qu. Max. # -5.4817 -0.3648 1.5931 1.4498 3.3325 7.6403

The output of the previous R syntax is a list containing one list element for each group. Each of these list elements contains basic summary statistics for the corresponding group.

## Example 2: Descriptive Summary Statistics by Group Using dplyr Package

Another alternative for the computation of descriptive summary statistics is provided by the dplyr package.

First, we have to install and load the dplyr package:

install.packages("dplyr") # Install dplyr package library("dplyr") # Load dplyr package |

install.packages("dplyr") # Install dplyr package library("dplyr") # Load dplyr package

Now, we can apply the group_by and summarize functions to calculate summary statistics by group:

data %>% # Summary by group using dplyr group_by(group) %>% summarize(min = min(x), q1 = quantile(x, 0.25), median = median(x), mean = mean(x), q3 = quantile(x, 0.75), max = max(x)) # # A tibble: 5 x 7 # group min q1 median mean q3 max # <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> # 1 A -7.24 -1.16 1.53 1.34 3.83 8.75 # 2 B -7.15 -1.00 0.944 1.04 3.00 10.2 # 3 C -6.64 -1.28 1.34 1.03 2.96 8.67 # 4 D -7.77 -1.22 0.785 0.728 2.33 8.35 # 5 E -5.48 -0.365 1.59 1.45 3.33 7.64 |

data %>% # Summary by group using dplyr group_by(group) %>% summarize(min = min(x), q1 = quantile(x, 0.25), median = median(x), mean = mean(x), q3 = quantile(x, 0.75), max = max(x)) # # A tibble: 5 x 7 # group min q1 median mean q3 max # <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> # 1 A -7.24 -1.16 1.53 1.34 3.83 8.75 # 2 B -7.15 -1.00 0.944 1.04 3.00 10.2 # 3 C -6.64 -1.28 1.34 1.03 2.96 8.67 # 4 D -7.77 -1.22 0.785 0.728 2.33 8.35 # 5 E -5.48 -0.365 1.59 1.45 3.33 7.64

The output of the previous R code is a tibble that contains basically the same values as the list created in Example 1. Whether you prefer to use the basic installation or the dplyr package is a matter of taste.

## Example 3: Descriptive Summary Statistics by Group Using purrr Package

In Example 3, I’ll illustrate another alternative for the calculation of summary statistics by group in R.

This example relies on the functions of the purrr package (another add-on package provided by the tidyverse).

We first have to install and load the purrr package:

install.packages("purrr") # Install & load purrr library("purrr") |

install.packages("purrr") # Install & load purrr library("purrr")

Now, we can use the following R code to produce another kind of output showing descriptive stats by group:

data %>% # Summary by group using purrr split(.$group) %>% map(summary) # $A # x group # Min. :-7.236 A:100 # 1st Qu.:-1.161 B: 0 # Median : 1.530 C: 0 # Mean : 1.339 D: 0 # 3rd Qu.: 3.834 E: 0 # Max. : 8.747 # # $B # x group # Min. :-7.148 A: 0 # 1st Qu.:-1.002 B:100 # Median : 0.944 C: 0 # Mean : 1.037 D: 0 # 3rd Qu.: 3.004 E: 0 # Max. :10.216 # # $C # x group # Min. :-6.636 A: 0 # 1st Qu.:-1.282 B: 0 # Median : 1.340 C:100 # Mean : 1.030 D: 0 # 3rd Qu.: 2.956 E: 0 # Max. : 8.667 # # $D # x group # Min. :-7.7652 A: 0 # 1st Qu.:-1.2207 B: 0 # Median : 0.7849 C: 0 # Mean : 0.7280 D:100 # 3rd Qu.: 2.3334 E: 0 # Max. : 8.3459 # # $E # x group # Min. :-5.4817 A: 0 # 1st Qu.:-0.3648 B: 0 # Median : 1.5931 C: 0 # Mean : 1.4498 D: 0 # 3rd Qu.: 3.3325 E:100 # Max. : 7.6403 |

data %>% # Summary by group using purrr split(.$group) %>% map(summary) # $A # x group # Min. :-7.236 A:100 # 1st Qu.:-1.161 B: 0 # Median : 1.530 C: 0 # Mean : 1.339 D: 0 # 3rd Qu.: 3.834 E: 0 # Max. : 8.747 # # $B # x group # Min. :-7.148 A: 0 # 1st Qu.:-1.002 B:100 # Median : 0.944 C: 0 # Mean : 1.037 D: 0 # 3rd Qu.: 3.004 E: 0 # Max. :10.216 # # $C # x group # Min. :-6.636 A: 0 # 1st Qu.:-1.282 B: 0 # Median : 1.340 C:100 # Mean : 1.030 D: 0 # 3rd Qu.: 2.956 E: 0 # Max. : 8.667 # # $D # x group # Min. :-7.7652 A: 0 # 1st Qu.:-1.2207 B: 0 # Median : 0.7849 C: 0 # Mean : 0.7280 D:100 # 3rd Qu.: 2.3334 E: 0 # Max. : 8.3459 # # $E # x group # Min. :-5.4817 A: 0 # 1st Qu.:-0.3648 B: 0 # Median : 1.5931 C: 0 # Mean : 1.4498 D: 0 # 3rd Qu.: 3.3325 E:100 # Max. : 7.6403

Again, the values are basically the same.

## Video, Further Resources & Summary

Have a look at the following video of my YouTube channel. I’m explaining the topics of this article in the video:

*The YouTube video will be added soon.*

In addition, I can recommend to have a look at the other tutorials on this homepage. A selection of articles can be found below.

In this article, I showed how to **get summary statistics for each group of a data frame** in the R programming language. Don’t hesitate to let me know in the comments section, if you have further questions and/or comments.

**5**/

**5**(

**1**vote )

### Statistics Globe Newsletter