Count Observations by Factor Level in R (3 Examples)
In this tutorial, I’ll show how to return the count of each category of a factor in R programming.
The tutorial will contain the following content:
Let’s dive right into the programming part:
Example Data
As a first step, I have to create some example data:
vec <- as.factor(c("A", "B", "A", # Create example factor "C", "C", "A", "B", "C", "D")) vec <- vec[- length(vec)] vec # Print example factor # [1] A B A C C A B C # Levels: A B C D
The previous output of the RStudio console shows the structure of our example data: It’s a factor vector consisting of eight vector elements.
Note that our factor has four different factor levels – A, B, C, and D. The factor level D is empty.
Let’s count the occurrences of each of the categories of our factor.
Example 1: Get Frequency of Categories Using table() Function
In this example, I’ll explain how to count the number of values per level in a given factor using the table function provided by the basic installation of the R programming language.
Have a look at the following R code and its output:
table(vec) # Applying table function # vec # A B C D # 3 2 3 0
As you can see, the output is a frequency table. The header of this table is identifying the four different factor levels of our categorical variable (i.e. A, B, C, and D). The first row of our frequency table shows how often each of these values appears in our data (i.e. A exists three times, B exists two times, C exists three times, and D exists zero times).
Looks good! However, the R programming language provides many add-on packages that are able to produce frequency tables and in the following examples I’ll explain two of those packages. So keep on reading!
Example 2: Get Frequency of Categories Using count() Function of dplyr Package
In this example, I’ll show how to use the dplyr package to count the number of observations by factor levels.
If we want to use the functions of the dplyr package, we first have to install and load dplyr:
install.packages("dplyr") # Install dplyr package library("dplyr") # Load dplyr
Furthermore, we have to convert our factor vector to a data.frame:
data_vec <- data.frame(vec) # Create data frame data_vec # Print data frame # vec # 1 A # 2 B # 3 A # 4 C # 5 C # 6 A # 7 B # 8 C
Now, we can apply the count function of the dplyr package to create a frequency table:
dplyr::count(data_vec, vec) # Applying count function # vec n # <fct> <int> # 1 A 3 # 2 B 2 # 3 C 3
Note that the previous table doesn’t show empty categories, i.e. the empty factor level D is not shown.
Example 3: Get Frequency of Categories Using data.table Package
This example explains how to use the data.table package to count the number of cases in each category.
We first need to install and load the data.table package, if we want to use the functions and commands that are included in the package:
install.packages("data.table") # Install & load data.table package library("data.table")
Now, we can use the following R code to return a table with frequency counts:
setDT(data_vec)[ , .N, keyby = vec] # Using data.table package # vec N # 1: A 3 # 2: B 2 # 3: C 3
Note that the data.table also doesn’t return the count of empty categories.
Video & Further Resources
Do you need further information on the content of this tutorial? Then I can recommend to watch the following video of my YouTube channel. In the video tutorial, I’m explaining the content of this tutorial in R.
Please accept YouTube cookies to play this video. By accepting you will be accessing content from YouTube, a service provided by an external third party.
If you accept this notice, your choice will be saved and the page will refresh.
Furthermore, you could read the other articles on this website. I have released numerous articles already:
To summarize: At this point you should know how to get the frequency counts of factor vectors and variables in the R programming language. Don’t hesitate to let me know in the comments section below, in case you have further questions.
Statistics Globe Newsletter
2 Comments. Leave new
Thanks, this was helpful. Now I’m wondering how I can use these counts in a graph.
For example, let’s say I have a healthcare data set on patients with diabetes, with variables:
-DM (History of Diabetes, Factor, 0/1)
-Sex(M/F, Factor, 0/1),
-Vital_Status (Alive/Dead, Factor,0/1)
And I would like to create a bar graph with:
x=DM y/n
y=Count (n) of alive/dead
color=gender
Hey Danly,
Thanks for the kind comment!
Could you illustrate the structure of your data set in some more detail? What is returned when you execute the following R code:
Regards,
Joachim