aggregate Function in R (3 Examples)
In this tutorial you’ll learn how to apply the aggregate function in the R programming language.
The table of content looks like this:
It’s time to dive into the examples:
Definition & Basic R Syntax of aggregate Function
Definition: The aggregate R function computes summary statistics of subgroups of a data set.
Basic R Syntax: You can find the basic R programming syntax of the aggregate function below.
aggregate(x = any_data, by = group_list, FUN = any_function) # Basic R syntax of aggregate function
In the following, I’ll explain in three examples how to apply the aggregate function in R.
Creation of Example Data
As a first step, let’s create some example data:
data <- data.frame(x1 = 1:5, # Create example data x2 = 2:6, x3 = 1, group = c("A", "A", "B", "C", "C")) data # Print data # x1 x2 x3 group # 1 1 2 1 A # 2 2 3 1 A # 3 3 4 1 B # 4 4 5 1 C # 5 5 6 1 C
The previously shown output of the RStudio console shows that the example data has five rows and four columns. The variables x1, x2, and x3 contain numeric values and the variable group is a grouping indicator dividing our data into subgroups.
Example 1: Compute Mean by Group Using aggregate Function
In Example 1, I’ll explain how to use the aggregate function to return the mean of each subgroup and of each variable of our example data. Within the aggregate function, we need to specify three arguments:
- The input data.
- The grouping indicator.
- The function we want to apply to each subgroup.
Have a look at the following R code:
aggregate(x = data[ , colnames(data) != "group"], # Mean by group by = list(data$group), FUN = mean) # Group.1 x1 x2 x3 # 1 A 1.5 2.5 1 # 2 B 3.0 4.0 1 # 3 C 4.5 5.5 1
As you can see, the RStudio console returned the mean for each subgroup (i.e. A, B, and C) for each of our numeric variables (i.e. x1, x2, and x3).
Note that we had to exclude the grouping indicator from our data frame and also note that we had to convert the grouping indicator to a list. These are necessary conditions of the aggregate function.
Example 2: Compute Sum by Group Using aggregate Function
In the previous Example we have calculated the mean of each subgroup across multiple columns of our data frame. However, it is easily possible to apply other functions within the aggregate command. In Example 2, I’ll illustrate how to return the sum by group using the aggregate function:
aggregate(x = data[ , colnames(data) != "group"], # Sum by group by = list(data$group), FUN = sum) # Group.1 x1 x2 x3 # 1 A 3 5 2 # 2 B 3 4 1 # 3 C 9 11 2
All we had to change was the FUN argument within the aggregate function. The previous output shows the count by group of our example data.
Example 3: Applying aggregate Function to Data Containing NAs
A typical problem when applying the aggregate function are missing values in the input data frame. Example 3 therefore explains how to handle NA values with the aggregate function. First, let’s insert some NA values to our example data:
data_NA <- data # Create data containing NAs data_NA$x1 <- NA data_NA$x2 <- NA data_NA # Print data # x1 x2 x3 group # 1 1 2 1 A # 2 NA 3 1 A # 3 3 4 1 B # 4 4 NA 1 C # 5 5 6 1 C
The previous output of the RStudio console shows how our updated data looks like. As you can see, some data cells were set to NA.
Let’s try to apply the aggregate function as we did before:
aggregate(x = data_NA[ , colnames(data_NA) != "group"], # aggregate without na.rm by = list(data_NA$group), FUN = mean) # Group.1 x1 x2 x3 # 1 A NA 2.5 1 # 2 B 3.0 4.0 1 # 3 C 4.5 NA 1
As you can see, some of the values in the output are NA. Fortunately, we can simply remove our NA values temporarily using the na.rm argument within the aggregate function:
aggregate(x = data_NA[ , colnames(data_NA) != "group"], # Using na.rm option by = list(data_NA$group), FUN = mean, na.rm = TRUE) # Group.1 x1 x2 x3 # 1 A 1.0 2.5 1 # 2 B 3.0 4.0 1 # 3 C 4.5 6.0 1
Video, Further Resources & Summary
Do you need further info on the R codes of this tutorial? Then you might have a look at the following video of my YouTube channel. I’m explaining the examples of this post in the video.
The YouTube video will be added soon.
Furthermore, you might want to have a look at the other articles of my website. I have released several articles already.
- Mean by Group – dplyr Package vs. Base R
- Sum by Group in R
- Count Number of Cases within Each Group of Data Frame
- Count Unique Values in R
- R Functions List (+ Examples)
- The R Programming Language
Summary: You learned in this article how to use the aggregate function to compute descriptive statistics by group in the R programming language. Don’t hesitate to tell me about it in the comments below, in case you have any additional questions or comments.