Sum by Group in R (2 Examples)
In this article, I’ll explain how to compute the sum by group in the R programming language.
I’ll show two different alternatives including reproducible R codes. More precisely, this tutorial contains the following topics:
- Creation of Example Data
- Sum by Group (aggregate Function of Base R)
- Sum by Group (group_by Function of dplyr Package)
- Further Resources
So now the part you have been waiting for…
Creation of Example Data
In the examples of this tutorial, I’ll use the Iris Flower data set as example data. Let’s load the data to RStudio:
data(iris) # Load Iris data head(iris) # First rows of Iris
Table 1: The Iris Data Set (First Six Rows).
Table 1 shows the structure of the Iris data set. The data matrix consists of several numeric columns as well as of the grouping variable Species.
In the following examples, we will compute the sum of the first column vector Sepal.Length within each Species group.
Example 1: Sum by Group Based on aggregate R Function
In the first example, I’ll show you how to compute the sum by group with the aggregate function.
An advantage of the aggregate function is that it is already included in your Base R installation. Therefore we do not need to install any add-on packages.
The aggregate function can be used to calculate the summation of each group as follows:
aggregate(x = iris$Sepal.Length, # Specify data column by = list(iris$Species), # Specify group indicator FUN = sum) # Specify function (i.e. sum) # Group.1 x # 1 setosa 250.3 # 2 versicolor 296.8 # 3 virginica 329.4
You can see based on the RStudio console output that the sum of all values of the setosa group is 250.3, the sum of the versicolor group is 296.8, and the sum of the virginica group is equal to 329.4.
Do you need more explanations on the computation of the sum based on a grouping variable with the aggregate function? Then have a look at the following video of my YouTube channel. In the video, I’m explaining the previous example in more detail:
Please accept YouTube cookies to play this video. By accepting you will be accessing content from YouTube, a service provided by an external third party.
If you accept this notice, your choice will be saved and the page will refresh.
Example 2: Sum by Group Based on dplyr Package
The dplyr package is a very powerful R add-on package and is used by many R users as often as possible. In case you also prefer to work within the dplyr framework, you can use the R syntax of this example for the computation of the sum by group.
First, we need to install and load the dplyr package in R:
install.packages("dplyr") # Install dplyr package library("dplyr") # Load dplyr package
Now we can use the group_by and the summarise_at functions to get the summation by group:
iris %>% # Specify data frame group_by(Species) %>% # Specify group indicator summarise_at(vars(Sepal.Length), # Specify column list(name = sum)) # Specify function # A tibble: 3 x 2 # Species name # <fct> <dbl> # 1 setosa 250. # 2 versicolor 297. # 3 virginica 329.
As you can see, the values are the same as in Example 1 (besides the fact that they are rounded).
Further Resources & Summary
This tutorial showed how to calculate group sums based on the R programming language. However, there is much more to learn on the addition of numeric values and also there is much more to learn regarding the R programming language. For that reason, you might want to have a look at some of the other R tutorials that I have published on my website:
- aggregate Function in R
- Sum in R
- Weighted Sum in R
- Mean by Group in R
- Column & Row Sums & Means
- The cumsum Function in R
- R Functions List (+ Examples)
- The R Programming Language
This tutorial explained how to add values in order to compute the sum of a column, a variable, or a simple vector, i.e. summarizing values by a group such as dates, names, or countries. In case you have any further questions on this topic, please let me know in the comments.
Statistics Globe Newsletter