Proportions with dplyr Package in R (Example) | Create Relative Frequency Table

 

In this R tutorial you’ll learn how to compute relative frequencies / proportions with the dplyr package.

The article consists of the following contents:

Let’s dig in…

 

Creating Example Data

The example data that we’ll use in this tutorial looks as follows:

set.seed(9876)                        # Create random example data
data <- data.frame(x = sample(1:5, 100, replace = TRUE),
                   y = sample(LETTERS[1:3], 100, replace = TRUE))
head(data)                            # Print 6 rows of example data 
#   x y
# 1 1 A
# 2 5 B
# 3 5 A
# 4 2 B
# 5 4 C
# 6 2 A

Our example data frame consists of 100 rows and two columns. The variable x contains the values 1, 2, 3, 4, and 5; and the variable y consists of the values A, B, and C.

Furthermore, we have to install and load the dplyr package:

install.packages("dplyr")              # Install and load dplyr
library("dplyr")

In the following example, we’ll create a table, representing the relative frequencies / proportions of our example data.

Keep on reading!

 

Example: Get Relative Frequencies of Data Frame in R

In order to create a frequency table with the dplyr package, we can use a combination of the group_by, summarise, n, mutate, and sum functions. Have a look at the following R syntax:

data %>%                               # Create tibble with frequencies
  group_by(x, y) %>%
  summarise(n = n()) %>%
  mutate(freq = n / sum(n))
# `summarise()` has grouped output by 'x'. You can override using the `.groups` argument.
# # A tibble: 15 x 4
# # Groups:   x [5]
#        x y         n  freq
#    <int> <chr> <int> <dbl>
#  1     1 A        10 0.476
#  2     1 B         7 0.333
#  3     1 C         4 0.190
#  4     2 A         7 0.438
#  5     2 B         5 0.312
#  6     2 C         4 0.25 
#  7     3 A         4 0.286
#  8     3 B         4 0.286
#  9     3 C         6 0.429
# 10     4 A         9 0.474
# 11     4 B         6 0.316
# 12     4 C         4 0.211
# 13     5 A        12 0.4  
# 14     5 B        10 0.333
# 15     5 C         8 0.267

As you can see based on the output of the RStudio console, the previous R code returned a tibble containing each possible combination of our two variables x and y as well as the count of each combination and the frequency of each combination. Note that the previous R code is based on this thread on Stack Overflow.

 

Video & Further Resources

I have recently published a video on my YouTube channel, which explains the contents of this post. You can find the video below:

 

 

In addition, you might read the other tutorials on statisticsglobe.com. I have created similar articles to the present tutorial on topics such as contingency tables, pivot tables, and lookup tables:

 

In summary: In this article, I illustrated how to summarize categorical variables in a frequency / proportion table with the dplyr package in R programming. If you have additional comments or questions, please let me know in the comments section. Besides that, don’t forget to subscribe to my email newsletter in order to get updates on new articles.

 

Subscribe to the Statistics Globe Newsletter

Get regular updates on the latest tutorials, offers & news at Statistics Globe.
I hate spam & you may opt out anytime: Privacy Policy.


6 Comments. Leave new

  • can you make graph with the frequency table?

    Reply
  • Hi Joachim,

    your script alows to get a printed table in the consol window. I would like that the new columns (n and freq) are added to my initial table (x and y). This way the rsults are stored and can be use for making plots.
    How I could do that?
    Many thanks in advance.
    Bijgom

    Reply
    • Hi Bijgom,

      Thank you for the comment!

      Unfortunately, it is not possible to add n and freq to the initial table, since n and freq have a different length than the number of rows of the initial table (i.e. 15 vs. 100).

      However, you may store the results of the frequency table in a new table by adding data_freq <- in front of the R code:

      data_freq <- data %>%
        group_by(x, y) %>%
        summarise(n = n()) %>%
        mutate(freq = n / sum(n))

      I hope that helps!

      Joachim

      Reply
  • I have been trying to find a code that outputs the frequencies of multiple columns based on similar factor levels within each one of them to no avail. Say for instance my dataframe is df with 5 columns with three levels say “2 Mins” “3Mins” and “4Mins”. How do we tabulate frequencies and percentages associated with each outcome across all the columns?

    Reply
    • Hey Fred,

      Should the percentages sum up to 1 in each column or among all columns together?

      Could you provide some example data, and a data frame containing the desired output?

      Regards

      Joachim

      Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.

Top