# aggregate Function in R (3 Examples)

In this tutorial youâ€™ll learn how to apply the aggregate function in the R programming language.

The table of content looks like this:

Itâ€™s time to dive into the examples:

## Definition & Basic R Syntax of aggregate Function

Definition: The aggregate R function computes summary statistics of subgroups of a data set.

Basic R Syntax: You can find the basic R programming syntax of the aggregate function below.

aggregate(x = any_data, by = group_list, FUN = any_function)  # Basic R syntax of aggregate function

In the following, Iâ€™ll explain in three examples how to apply the aggregate function in R.

## Creation of Example Data

As a first step, letâ€™s create some example data:

data <- data.frame(x1 = 1:5,                                  # Create example data
x2 = 2:6,
x3 = 1,
group = c("A", "A", "B", "C", "C"))
data                                                          # Print data
#   x1 x2 x3 group
# 1  1  2  1     A
# 2  2  3  1     A
# 3  3  4  1     B
# 4  4  5  1     C
# 5  5  6  1     C

The previously shown output of the RStudio console shows that the example data has five rows and four columns. The variables x1, x2, and x3 contain numeric values and the variable group is a grouping indicator dividing our data into subgroups.

## Example 1: Compute Mean by Group Using aggregate Function

In Example 1, Iâ€™ll explain how to use the aggregate function to return the mean of each subgroup and of each variable of our example data. Within the aggregate function, we need to specify three arguments:

• The input data.
• The grouping indicator.
• The function we want to apply to each subgroup.

Have a look at the following R code:

aggregate(x = data[ , colnames(data) != "group"],             # Mean by group
by = list(data\$group),
FUN = mean)
#   Group.1  x1  x2 x3
# 1       A 1.5 2.5  1
# 2       B 3.0 4.0  1
# 3       C 4.5 5.5  1

As you can see, the RStudio console returned the mean for each subgroup (i.e. A, B, and C) for each of our numeric variables (i.e. x1, x2, and x3).

Note that we had to exclude the grouping indicator from our data frame and also note that we had to convert the grouping indicator to a list. These are necessary conditions of the aggregate function.

## Example 2: Compute Sum by Group Using aggregate Function

In the previous Example we have calculated the mean of each subgroup across multiple columns of our data frame. However, it is easily possible to apply other functions within the aggregate command. In Example 2, Iâ€™ll illustrate how to return the sum by group using the aggregate function:

aggregate(x = data[ , colnames(data) != "group"],             # Sum by group
by = list(data\$group),
FUN = sum)
#   Group.1 x1 x2 x3
# 1       A  3  5  2
# 2       B  3  4  1
# 3       C  9 11  2

All we had to change was the FUN argument within the aggregate function. The previous output shows the count by group of our example data.

## Example 3: Applying aggregate Function to Data Containing NAs

A typical problem when applying the aggregate function are missing values in the input data frame. Example 3 therefore explains how to handle NA values with the aggregate function. First, letâ€™s insert some NA values to our example data:

data_NA <- data                                               # Create data containing NAs
data_NA\$x1[2] <- NA
data_NA\$x2[4] <- NA
data_NA                                                       # Print data
#   x1 x2 x3 group
# 1  1  2  1     A
# 2 NA  3  1     A
# 3  3  4  1     B
# 4  4 NA  1     C
# 5  5  6  1     C

The previous output of the RStudio console shows how our updated data looks like. As you can see, some data cells were set to NA.

Letâ€™s try to apply the aggregate function as we did before:

aggregate(x = data_NA[ , colnames(data_NA) != "group"],       # aggregate without na.rm
by = list(data_NA\$group),
FUN = mean)
#   Group.1  x1  x2 x3
# 1       A  NA 2.5  1
# 2       B 3.0 4.0  1
# 3       C 4.5  NA  1

As you can see, some of the values in the output are NA. Fortunately, we can simply remove our NA values temporarily using the na.rm argument within the aggregate function:

aggregate(x = data_NA[ , colnames(data_NA) != "group"],       # Using na.rm option
by = list(data_NA\$group),
FUN = mean,
na.rm = TRUE)
#   Group.1  x1  x2 x3
# 1       A 1.0 2.5  1
# 2       B 3.0 4.0  1
# 3       C 4.5 6.0  1

Looks better!

## Video, Further Resources & Summary

Do you need further info on the R codes of this tutorial? Then you might have a look at the following video of my YouTube channel. Iâ€™m explaining the examples of this post in the video.

Furthermore, you might want to have a look at the other articles of my website. I have released several articles already.

Summary: You learned in this article how to use the aggregate function to compute descriptive statistics by group in the R programming language. Donâ€™t hesitate to tell me about it in the comments below, in case you have any additional questions or comments.

Subscribe to the Statistics Globe Newsletter

• What would you do if instead of applying a math function, you wanted to group character observations separated by a “,”?

For example, if I hace the following data frame

bd<-data.frame(id=c("01","02","01","03","02"),
pet=c("dog","dog","cat","cat","pig"))

And I want to use aggregate to get the following result:

id pet
01 "dog,cat"
02 "dog,pig"
03 "cat"

• Hello Arturo,

See this solution:

bd<-data.frame(id=c("01","02","01","03","02"),
pet=c("dog","dog","cat","cat","pig"))

# Use paste to concatenate the pet strings for each id
bd_concat <- aggregate(pet ~ id, data = bd, FUN = paste, collapse = ",")

# Display the resulting data frame
bd_concat
#   id     pet
# 1 01 dog,cat
# 2 02 dog,pig
# 3 03     cat

Regards,
Cansu