aggregate Function in R (3 Examples)

In this tutorial you’ll learn how to apply the aggregate function in the R programming language.

The table of content looks like this:

1) Definition & Basic R Syntax of aggregate Function

2) Creation of Example Data

3) Example 1: Compute Mean by Group Using aggregate Function

4) Example 2: Compute Sum by Group Using aggregate Function

5) Example 3: Applying aggregate Function to Data Containing NAs

6) Video, Further Resources & Summary

7) Subscribe to the Statistics Globe Newsletter

8) Thank you!

It’s time to dive into the examples:

Definition & Basic R Syntax of aggregate Function

Definition: The aggregate R function computes summary statistics of subgroups of a data set.

Basic R Syntax: You can find the basic R programming syntax of the aggregate function below.

aggregate(x = any_data, by = group_list, FUN = any_function)  # Basic R syntax of aggregate function

In the following, I’ll explain in three examples how to apply the aggregate function in R.

Creation of Example Data

As a first step, let’s create some example data:

data <- data.frame(x1 = 1:5,                                  # Create example data
                   x2 = 2:6,
                   x3 = 1,
                   group = c("A", "A", "B", "C", "C"))
data                                                          # Print data
#   x1 x2 x3 group
# 1  1  2  1     A
# 2  2  3  1     A
# 3  3  4  1     B
# 4  4  5  1     C
# 5  5  6  1     C

The previously shown output of the RStudio console shows that the example data has five rows and four columns. The variables x1, x2, and x3 contain numeric values and the variable group is a grouping indicator dividing our data into subgroups.

Example 1: Compute Mean by Group Using aggregate Function

In Example 1, I’ll explain how to use the aggregate function to return the mean of each subgroup and of each variable of our example data. Within the aggregate function, we need to specify three arguments:

The input data.
The grouping indicator.
The function we want to apply to each subgroup.

Have a look at the following R code:

aggregate(x = data[ , colnames(data) != "group"],             # Mean by group
          by = list(data$group),
          FUN = mean)
#   Group.1  x1  x2 x3
# 1       A 1.5 2.5  1
# 2       B 3.0 4.0  1
# 3       C 4.5 5.5  1

As you can see, the RStudio console returned the mean for each subgroup (i.e. A, B, and C) for each of our numeric variables (i.e. x1, x2, and x3).

Note that we had to exclude the grouping indicator from our data frame and also note that we had to convert the grouping indicator to a list. These are necessary conditions of the aggregate function.

Example 2: Compute Sum by Group Using aggregate Function

In the previous Example we have calculated the mean of each subgroup across multiple columns of our data frame. However, it is easily possible to apply other functions within the aggregate command. In Example 2, I’ll illustrate how to return the sum by group using the aggregate function:

aggregate(x = data[ , colnames(data) != "group"],             # Sum by group
          by = list(data$group),
          FUN = sum)
#   Group.1 x1 x2 x3
# 1       A  3  5  2
# 2       B  3  4  1
# 3       C  9 11  2

All we had to change was the FUN argument within the aggregate function. The previous output shows the count by group of our example data.

Example 3: Applying aggregate Function to Data Containing NAs

A typical problem when applying the aggregate function are missing values in the input data frame. Example 3 therefore explains how to handle NA values with the aggregate function. First, let’s insert some NA values to our example data:

data_NA <- data                                               # Create data containing NAs
data_NA$x1[2] <- NA
data_NA$x2[4] <- NA
data_NA                                                       # Print data
#   x1 x2 x3 group
# 1  1  2  1     A
# 2 NA  3  1     A
# 3  3  4  1     B
# 4  4 NA  1     C
# 5  5  6  1     C

The previous output of the RStudio console shows how our updated data looks like. As you can see, some data cells were set to NA.

Let’s try to apply the aggregate function as we did before:

aggregate(x = data_NA[ , colnames(data_NA) != "group"],       # aggregate without na.rm
          by = list(data_NA$group),
          FUN = mean)
#   Group.1  x1  x2 x3
# 1       A  NA 2.5  1
# 2       B 3.0 4.0  1
# 3       C 4.5  NA  1

As you can see, some of the values in the output are NA. Fortunately, we can simply remove our NA values temporarily using the na.rm argument within the aggregate function:

aggregate(x = data_NA[ , colnames(data_NA) != "group"],       # Using na.rm option
          by = list(data_NA$group),
          FUN = mean,
          na.rm = TRUE)
#   Group.1  x1  x2 x3
# 1       A 1.0 2.5  1
# 2       B 3.0 4.0  1
# 3       C 4.5 6.0  1

Looks better!

Video, Further Resources & Summary

Do you need further info on the R codes of this tutorial? Then you might have a look at the following video of my YouTube channel. I’m explaining the examples of this post in the video.

The YouTube video will be added soon.

Furthermore, you might want to have a look at the other articles of my website. I have released several articles already.

Summary: You learned in this article how to use the aggregate function to compute descriptive statistics by group in the R programming language. Don’t hesitate to tell me about it in the comments below, in case you have any additional questions or comments.

2 Comments. Leave new

Arturo
April 17, 2023 6:41 pm

What would you do if instead of applying a math function, you wanted to group character observations separated by a “,”?

For example, if I hace the following data frame

bd<-data.frame(id=c("01","02","01","03","02"),
pet=c("dog","dog","cat","cat","pig"))

And I want to use aggregate to get the following result:

id pet
01 "dog,cat"
02 "dog,pig"
03 "cat"

Reply
- Cansu (Statistics Globe)
  April 18, 2023 1:36 pm
  Hello Arturo,
  
  See this solution:
  bd<-data.frame(id=c("01","02","01","03","02"), pet=c("dog","dog","cat","cat","pig")) # Use paste to concatenate the pet strings for each id bd_concat <- aggregate(pet ~ id, data = bd, FUN = paste, collapse = ",") # Display the resulting data frame bd_concat # id pet # 1 01 dog,cat # 2 02 dog,pig # 3 03 cat
  Regards,
  Cansu
  Reply