Sum by Group in R (2 Examples)

 

In this article, I’ll explain how to compute the sum by group in the R programming language.

I’ll show two different alternatives including reproducible R codes. More precisely, this tutorial contains the following topics:

So now the part you have been waiting for…

 

Creation of Example Data

In the examples of this tutorial, I’ll use the Iris Flower data set as example data. Let’s load the data to RStudio:

data(iris)                                      # Load Iris data
head(iris)                                      # First rows of Iris

 

nrow function in R - Iris Example Data Frame

Table 1: The Iris Data Set (First Six Rows).

 

Table 1 shows the structure of the Iris data set. The data matrix consists of several numeric columns as well as of the grouping variable Species.

In the following examples, we will compute the sum of the first column vector Sepal.Length within each Species group.

 

Example 1: Sum by Group Based on aggregate R Function

In the first example, I’ll show you how to compute the sum by group with the aggregate function.

An advantage of the aggregate function is that it is already included in your Base R installation. Therefore we do not need to install any add-on packages.

The aggregate function can be used to calculate the summation of each group as follows:

aggregate(x = iris$Sepal.Length,                # Specify data column
          by = list(iris$Species),              # Specify group indicator
          FUN = sum)                            # Specify function (i.e. sum)
#      Group.1     x
# 1     setosa 250.3
# 2 versicolor 296.8
# 3  virginica 329.4

You can see based on the RStudio console output that the sum of all values of the setosa group is 250.3, the sum of the versicolor group is 296.8, and the sum of the virginica group is equal to 329.4.

Do you need more explanations on the computation of the sum based on a grouping variable with the aggregate function? Then have a look at the following video of my YouTube channel. In the video, I’m explaining the previous example in more detail:

 

 

Example 2: Sum by Group Based on dplyr Package

The dplyr package is a very powerful R add-on package and is used by many R users as often as possible. In case you also prefer to work within the dplyr framework, you can use the R syntax of this example for the computation of the sum by group.

First, we need to install and load the dplyr package in R:

install.packages("dplyr")                       # Install dplyr package
library("dplyr")                                # Load dplyr package

Now we can use the group_by and the summarise_at functions to get the summation by group:

iris %>%                                        # Specify data frame
  group_by(Species) %>%                         # Specify group indicator
  summarise_at(vars(Sepal.Length),              # Specify column
               list(name = sum))                # Specify function
 
# A tibble: 3 x 2
# Species       name
# <fct>         <dbl>
# 1 setosa      250.
# 2 versicolor  297.
# 3 virginica   329.

As you can see, the values are the same as in Example 1 (besides the fact that they are rounded).

 

Further Resources & Summary

This tutorial showed how to calculate group sums based on the R programming language. However, there is much more to learn on the addition of numeric values and also there is much more to learn regarding the R programming language. For that reason, you might want to have a look at some of the other R tutorials that I have published on my website:

This tutorial explained how to add values in order to compute the sum of a column, a variable, or a simple vector, i.e. summarizing values by a group such as dates, names, or countries. In case you have any further questions on this topic, please let me know in the comments.

 

Subscribe to the Statistics Globe Newsletter

Get regular updates on the latest tutorials, offers & news at Statistics Globe.
I hate spam & you may opt out anytime: Privacy Policy.


15 Comments. Leave new

  • Hey can you do more than one group.

    So sum on the Column Durations, by ID, Date and Activity. So by day,you would get the sum of the Activity for each ID ?

    Reply
  • Destiny Louise Bradley
    October 24, 2020 12:30 pm

    I worked it out 🙂

    Thank you for sharing these ! really so helpful!

    Reply
  • Thanks buddy it really works and its very useful i used to use the next structure:

    df %>%
    group_by(Year, Genre) %>%
    arrange(Year) %>%
    summarise(Total_ex = n())

    But the n() it seems dissapear recently cause R give the next mssge:

    Error: `n()` must only be used inside dplyr verbs.

    Reply
    • Hey W.A.C,

      Thank you for the kind words, glad to hear that the tutorial helped!

      Regarding the error message: Could you try to put dplyr:: in front of the summarise function? The summarise function exists in dplyr and in plyr and therefore R might be confused.

      Regards,

      Joachim

      Reply
  • Jessica Bryzek
    January 20, 2022 4:29 pm

    The code works, but I have other data columns I’d like to retain. When I do the aggregate function, the other columns disappear. Is there a way to retain other columns?

    Reply
    • Hey Jessica,

      Please have a look at the two examples below:

      aggregate(. ~ Species, # Keep all variables
                iris,
                sum)
      #      Species Sepal.Length Sepal.Width Petal.Length Petal.Width
      # 1     setosa        250.3       171.4         73.1        12.3
      # 2 versicolor        296.8       138.5        213.0        66.3
      # 3  virginica        329.4       148.7        277.6       101.3
      aggregate(. ~ Species, # Keep some variables
                iris[ , c("Sepal.Length", "Sepal.Width", "Species")],
                sum)
      #      Species Sepal.Length Sepal.Width
      # 1     setosa        250.3       171.4
      # 2 versicolor        296.8       138.5
      # 3  virginica        329.4       148.7

      Regards,
      Joachim

      Reply
  • Hello Joachim

    Thanks for the code, it works well. I just have one issue. I don’t get the results in my console to see. Do you maybe know why this could be the case?

    Thanks in advance.
    Best,
    Marlene

    Reply
  • I am trying to make my time series data continuous by filling in the missing values with this code.

    volume_data_top3<-combinedads_till_May22
    final_Date <- as.Date(max(combinedads_till_May22$timestamp))
    Mod_data %group_by(key)%>%
    complete(key, timestamp = seq.Date(min(timestamp),final_Date,by = “month”)) %>%ungroup()

    But then when I execute it I am getting an error like,

    Error in `dplyr::summarise()`:
    ! Problem while computing `..1 = complete(data = dplyr::cur_data(), …, fill = fill, explicit = explicit)`.
    ℹ The error occurred in group 1: key = “HA31-ACBL1”.
    Caused by error in `grid_dots()`:
    ! `..1` must be a vector, not a function.

    This code is running in my colleague’s laptop but not in mine, I have uninstalled and then reinstalled R but still can’t find the solution to this error. Please guide me as to what I am doing wrong.

    Reply
    • Hey Atharva,

      Since this code runs fine on your colleagues computer, it seems like there is a problem with the packages you have (or have not) installed/loaded on your computer.

      I would try to add the desired packages in front of the function names (e.g. dplyr::ungroup).

      I hope this helps!

      Joachim

      Reply
  • Hello Joachim
    I need a function that sum the price column when Gender column has NA. and this will be done for all the column so I tried to do that using loop I could calculate the sum and even the loop

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.

Top