Count Observations by Factor Level in R (3 Examples)

 

In this tutorial, I’ll show how to return the count of each category of a factor in R programming.

The tutorial will contain the following content:

Let’s dive right into the programming part:

 

Example Data

As a first step, I have to create some example data:

vec <- as.factor(c("A", "B", "A",      # Create example factor
                   "C", "C", "A",
                   "B", "C", "D"))
vec <- vec[- length(vec)]
vec                                    # Print example factor
# [1] A B A C C A B C
# Levels: A B C D

The previous output of the RStudio console shows the structure of our example data: It’s a factor vector consisting of eight vector elements.

Note that our factor has four different factor levels – A, B, C, and D. The factor level D is empty.

Let’s count the occurrences of each of the categories of our factor.

 

Example 1: Get Frequency of Categories Using table() Function

In this example, I’ll explain how to count the number of values per level in a given factor using the table function provided by the basic installation of the R programming language.

Have a look at the following R code and its output:

table(vec)                             # Applying table function
# vec
# A B C D 
# 3 2 3 0

As you can see, the output is a frequency table. The header of this table is identifying the four different factor levels of our categorical variable (i.e. A, B, C, and D). The first row of our frequency table shows how often each of these values appears in our data (i.e. A exists three times, B exists two times, C exists three times, and D exists zero times).

Looks good! However, the R programming language provides many add-on packages that are able to produce frequency tables and in the following examples I’ll explain two of those packages. So keep on reading!

 

Example 2: Get Frequency of Categories Using count() Function of dplyr Package

In this example, I’ll show how to use the dplyr package to count the number of observations by factor levels.

If we want to use the functions of the dplyr package, we first have to install and load dplyr:

install.packages("dplyr")              # Install dplyr package
library("dplyr")                       # Load dplyr

Furthermore, we have to convert our factor vector to a data.frame:

data_vec <- data.frame(vec)            # Create data frame
data_vec                               # Print data frame
#   vec
# 1   A
# 2   B
# 3   A
# 4   C
# 5   C
# 6   A
# 7   B
# 8   C

Now, we can apply the count function of the dplyr package to create a frequency table:

dplyr::count(data_vec, vec)            # Applying count function
#   vec       n
#   <fct> <int>
# 1 A         3
# 2 B         2
# 3 C         3

Note that the previous table doesn’t show empty categories, i.e. the empty factor level D is not shown.

 

Example 3: Get Frequency of Categories Using data.table Package

This example explains how to use the data.table package to count the number of cases in each category.

We first need to install and load the data.table package, if we want to use the functions and commands that are included in the package:

install.packages("data.table")         # Install & load data.table package
library("data.table")

Now, we can use the following R code to return a table with frequency counts:

setDT(data_vec)[ , .N, keyby = vec]    # Using data.table package
#    vec N
# 1:   A 3
# 2:   B 2
# 3:   C 3

Note that the data.table also doesn’t return the count of empty categories.

 

Video & Further Resources

Do you need further information on the content of this tutorial? Then I can recommend to watch the following video of my YouTube channel. In the video tutorial, I’m explaining the content of this tutorial in R.

 

 

Furthermore, you could read the other articles on this website. I have released numerous articles already:

 

To summarize: At this point you should know how to get the frequency counts of factor vectors and variables in the R programming language. Don’t hesitate to let me know in the comments section below, in case you have further questions.

 

Subscribe to the Statistics Globe Newsletter

Get regular updates on the latest tutorials, offers & news at Statistics Globe.
I hate spam & you may opt out anytime: Privacy Policy.


2 Comments. Leave new

  • Thanks, this was helpful. Now I’m wondering how I can use these counts in a graph.

    For example, let’s say I have a healthcare data set on patients with diabetes, with variables:
    -DM (History of Diabetes, Factor, 0/1)
    -Sex(M/F, Factor, 0/1),
    -Vital_Status (Alive/Dead, Factor,0/1)

    And I would like to create a bar graph with:
    x=DM y/n
    y=Count (n) of alive/dead
    color=gender

    Reply
    • Hey Danly,

      Thanks for the kind comment!

      Could you illustrate the structure of your data set in some more detail? What is returned when you execute the following R code:

      head(your_data)

      Regards,
      Joachim

      Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.

Top