Numbering Rows within Groups of Data Frame in R (2 Examples)

 

This article illustrates how to add a counter to each group of a data frame in the R programming language.

Table of contents:

Let’s start right away…

 

Constructing Example Data

In the examples of this R tutorial, we’ll use the following example data frame:

data <- data.frame(x = 1:10,                    # Create example data
                   group = c(rep("g1", 3),
                             rep("g2", 5),
                             rep("g3", 2)))
data                                            # Print example data
#     x group
# 1   1    g1
# 2   2    g1
# 3   3    g1
# 4   4    g2
# 5   5    g2
# 6   6    g2
# 7   7    g2
# 8   8    g2
# 9   9    g3
# 10 10    g3

As you can see based on the output of the RStudio console, our example data contains ten rows and two columns. The first column is numeric and the second column contains a factorial grouping variable.

 

Example 1: Numbering Rows of Data Frame Groups with Base R

If we want to create a column containing a counter for each group of our data frame, we can use the ave function as shown in the following R code:

data1 <- data                                   # Replicate example data
data1$numbering <- ave(data1$x,                 # Create numbering variable
                       data1$group,
                       FUN = seq_along)
data1                                           # Print updated data
#     x group numbering
# 1   1    g1         1
# 2   2    g1         2
# 3   3    g1         3
# 4   4    g2         1
# 5   5    g2         2
# 6   6    g2         3
# 7   7    g2         4
# 8   8    g2         5
# 9   9    g3         1
# 10 10    g3         2

The updated data frame consists of the same columns as our example data plus a variable numbering each group.

 

Example 2: Numbering Rows of Data Frame Groups with dplyr Package

Example 2 shows how to add a numbering sequence variable with the dplyr package in R. First, we need to install and load the package:

install.packages("dplyr")                       # Install and load dplyr
library("dplyr")

Now, we can use the group_by and the mutate functions to create a counter within each group:

data2 <- data                                   # Replicate example data
data2 <- data2 %>%                              # Create numbering variable
  group_by(group) %>%
  mutate(numbering = row_number())
data2                                           # Print updated data
# # A tibble: 10 x 3
# # Groups:   group [3]
#        x  group numbering
#    <int>  <fct>     <int>
#  1     1  g1            1
#  2     2  g1            2
#  3     3  g1            3
#  4     4  g2            1
#  5     5  g2            2
#  6     6  g2            3
#  7     7  g2            4
#  8     8  g2            5
#  9     9  g3            1
#  10    10 g3            2

The output is the same as in Example 1, but the data is stored as tibble.

 

Video & Further Resources

Have a look at the following video of my YouTube channel. In the video instruction, I illustrate the content of this article:

 

 

In addition, you may want to have a look at some of the related tutorials on this website:

 

To summarize: In this tutorial, I showed how to generate a sequential counter ID within each group of a data matrix in R programming. Don’t hesitate to tell me about it in the comments below, in case you have additional questions.

 

Subscribe to the Statistics Globe Newsletter

Get regular updates on the latest tutorials, offers & news at Statistics Globe.
I hate spam & you may opt out anytime: Privacy Policy.


2 Comments. Leave new

  • Ashley Buckner
    March 21, 2022 9:51 am

    Joachim, Thank you for this.
    When I tried the example 2 code on my data, however, I got the error
    Error: `n()` must only be used inside dplyr verbs.
    Putting ‘dplyr::’ in front of the function calls solved this to make sure they came from the right package (okay, bit of a scatter-gun solution but it did seem to work…).

    Reply
    • Hey Ashley,

      Thank you very much for the hint! This is a common problem when loading multiple packages containing functions with the same function names.

      Have a look here for more details on your specific error. Putting dplyr in front of the function names is probably the easiest fix for this.

      Regards,
      Joachim

      Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.

Top