Proportions with dplyr Package in R (Example) | Create Relative Frequency Table

 

In this R tutorial you’ll learn how to compute relative frequencies / proportions with the dplyr package.

The article consists of the following contents:

Let’s dig in…

 

Creating Example Data

The example data that we’ll use in this tutorial looks as follows:

set.seed(9876)                        # Create random example data
data <- data.frame(x = sample(1:5, 100, replace = TRUE),
                   y = sample(LETTERS[1:3], 100, replace = TRUE))
head(data)                            # Print 6 rows of example data 
#   x y
# 1 5 C
# 2 2 A
# 3 1 C
# 4 3 B
# 5 3 B
# 6 2 C

Our example data frame consists of 100 rows and two columns. The variable x contains the values 1, 2, 3, 4, and 5; and the variable y consists of the values A, B, and C.

Furthermore, we have to install and load the dplyr package:

install.packages("dplyr")              # Install and load dplyr
library("dplyr")

In the following example, we’ll create a table, representing the relative frequencies / proportions of our example data.

Keep on reading!

 

Example: Get Relative Frequencies of Data Frame in R

In order to create a frequency table with the dplyr package, we can use a combination of the group_by, summarise, n, mutate, and sum functions. Have a look at the following R syntax:

data %>%                               # Create tibble with frequencies
  group_by(x, y) %>%
  summarise(n = n()) %>%
  mutate(freq = n / sum(n))
# # A tibble: 15 x 4
# # Groups:   x [5]
#        x y         n  freq
#    <int> <fct> <int> <dbl>
#  1     1 A         6 0.316
#  2     1 B         6 0.316
#  3     1 C         7 0.368
#  4     2 A         4 0.222
#  5     2 B         5 0.278
#  6     2 C         9 0.5  
#  7     3 A         8 0.296
#  8     3 B        10 0.370
#  9     3 C         9 0.333
# 10     4 A         6 0.333
# 11     4 B         9 0.5  
# 12     4 C         3 0.167
# 13     5 A         5 0.278
# 14     5 B         8 0.444
# 15     5 C         5 0.278

As you can see based on the output of the RStudio console, the previous R code returned a tibble containing each possible combination of our two variables x and y as well as the count of each combination and the frequency of each combination. Note that the previous R code is based on this thread on Stack Overflow.

 

Video & Further Resources

I have recently published a video on my YouTube channel, which explains the contents of this post. You can find the video below:

 

Please accept YouTube cookies to play this video. By accepting you will be accessing content from YouTube, a service provided by an external third party.

YouTube Content Consent Button Thumbnail

YouTube privacy policy

If you accept this notice, your choice will be saved and the page will refresh.

 

In addition, you might read the other tutorials on statisticsglobe.com:

 

In summary: In this article, I illustrated how to summarize categorical variables in a frequency / proportion table with the dplyr package in R programming. If you have additional comments or questions, please let me know in the comments section. Besides that, don’t forget to subscribe to my email newsletter in order to get updates on new articles.

 

Subscribe to my free statistics newsletter

Get regular updates on the latest tutorials, offers & news at Statistics Globe.
I hate spam & you may opt out anytime: Privacy Policy.


4 Comments. Leave new

  • can you make graph with the frequency table?

    Reply
  • Hi Joachim,

    your script alows to get a printed table in the consol window. I would like that the new columns (n and freq) are added to my initial table (x and y). This way the rsults are stored and can be use for making plots.
    How I could do that?
    Many thanks in advance.
    Bijgom

    Reply
    • Hi Bijgom,

      Thank you for the comment!

      Unfortunately, it is not possible to add n and freq to the initial table, since n and freq have a different length than the number of rows of the initial table (i.e. 15 vs. 100).

      However, you may store the results of the frequency table in a new table by adding data_freq <- in front of the R code:

      data_freq <- data %>%
        group_by(x, y) %>%
        summarise(n = n()) %>%
        mutate(freq = n / sum(n))

      I hope that helps!

      Joachim

      Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.

Menu
Top