aggregate Function in R (3 Examples)

 

In this tutorial you’ll learn how to apply the aggregate function in the R programming language.

The table of content looks like this:

It’s time to dive into the examples:

Definition & Basic R Syntax of aggregate Function

 

Definition: The aggregate R function computes summary statistics of subgroups of a data set.

 

Basic R Syntax: You can find the basic R programming syntax of the aggregate function below.

aggregate(x = any_data, by = group_list, FUN = any_function)  # Basic R syntax of aggregate function

 

In the following, I’ll explain in three examples how to apply the aggregate function in R.

 

Creation of Example Data

As first step, let’s create some example data:

data <- data.frame(x1 = 1:5,                                  # Create example data
                   x2 = 2:6,
                   x3 = 1,
                   group = c("A", "A", "B", "C", "C"))
data                                                          # Print data
#   x1 x2 x3 group
# 1  1  2  1     A
# 2  2  3  1     A
# 3  3  4  1     B
# 4  4  5  1     C
# 5  5  6  1     C

The previously shown output of the RStudio console shows that the example data has five rows and four columns. The variables x1, x2, and x3 contain numeric values and the variable group is a grouping indicator dividing our data into subgroups.

 

Example 1: Compute Mean by Group Using aggregate Function

In Example 1, I’ll explain how to use the aggregate function to return the mean of each subgroup and of each variable of our example data. Within the aggregate function, we need to specify three arguments:

  • The input data.
  • The grouping indicator.
  • The function we want to apply to each subgroup.

Have a look at the following R code:

aggregate(x = data[ , colnames(data) != "group"],             # Mean by group
          by = list(data$group),
          FUN = mean)
#   Group.1  x1  x2 x3
# 1       A 1.5 2.5  1
# 2       B 3.0 4.0  1
# 3       C 4.5 5.5  1

As you can see, the RStudio console returned the mean for each subgroup (i.e. A, B, and C) for each of our numeric variables (i.e. x1, x2, and x3).

Note that we had to exclude the grouping indicator from our data frame and also note that we had to convert the grouping indicator to a list. These are necessary conditions of the aggregate function.

 

Example 2: Compute Sum by Group Using aggregate Function

In the previous Example we have calculated the mean of each subgroup across multiple columns of our data frame. However, it is easily possible to apply other functions within the aggregate command. In Example 2, I’ll illustrate how to return the sum by group using the aggregate function:

aggregate(x = data[ , colnames(data) != "group"],             # Sum by group
          by = list(data$group),
          FUN = sum)
#   Group.1 x1 x2 x3
# 1       A  3  5  2
# 2       B  3  4  1
# 3       C  9 11  2

All we had to change was the FUN argument within the aggregate function. The previous output shows the count by group of our example data.

 

Example 3: Applying aggregate Function to Data Containing NAs

A typical problem when applying the aggregate function are missing values in the input data frame. Example 3 therefore explains how to handle NA values with the aggregate function. First, let’s insert some NA values to our example data:

data_NA <- data                                               # Create data containing NAs
data_NA$x1[2] <- NA
data_NA$x2[4] <- NA
data_NA                                                       # Print data
#   x1 x2 x3 group
# 1  1  2  1     A
# 2 NA  3  1     A
# 3  3  4  1     B
# 4  4 NA  1     C
# 5  5  6  1     C

The previous output of the RStudio console shows how our updated data looks like. As you can see, some data cells were set to NA.

Let’s try to apply the aggregate function as we did before:

aggregate(x = data_NA[ , colnames(data_NA) != "group"],       # aggregate without na.rm
          by = list(data_NA$group),
          FUN = mean)
#   Group.1  x1  x2 x3
# 1       A  NA 2.5  1
# 2       B 3.0 4.0  1
# 3       C 4.5  NA  1

As you can see, some of the values in the output are NA. Fortunately, we can simply remove our NA values temporarily using the na.rm argument within the aggregate function:

aggregate(x = data_NA[ , colnames(data_NA) != "group"],       # Using na.rm option
          by = list(data_NA$group),
          FUN = mean,
          na.rm = TRUE)
#   Group.1  x1  x2 x3
# 1       A 1.0 2.5  1
# 2       B 3.0 4.0  1
# 3       C 4.5 6.0  1

Looks better!

 

Video, Further Resources & Summary

Do you need further info on the R codes of this tutorial? Then you might have a look at the following video of my YouTube channel. I’m explaining the examples of this post in the video.

 

The YouTube video will be added soon.

 

Furthermore, you might want to have a look at the other articles of my website. I have released several articles already.

 

Summary: You learned in this article how to use the aggregate function to compute descriptive statistics by group in the R programming language. Don’t hesitate to tell me about it in the comments below, in case you have any additional questions or comments.

 



Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.

Menu
Top