Proportions with dplyr Package in R (Example) | Create Relative Frequency Table
In this R tutorial you’ll learn how to compute relative frequencies / proportions with the dplyr package.
The article consists of the following contents:
- Creating Example Data
- Example: Get Relative Frequencies of Data Frame in R
- Video & Further Resources
Let’s dig in…
Creating Example Data
The example data that we’ll use in this tutorial looks as follows:
set.seed(9876) # Create random example data data <- data.frame(x = sample(1:5, 100, replace = TRUE), y = sample(LETTERS[1:3], 100, replace = TRUE)) head(data) # Print 6 rows of example data # x y # 1 1 A # 2 5 B # 3 5 A # 4 2 B # 5 4 C # 6 2 A
Our example data frame consists of 100 rows and two columns. The variable x contains the values 1, 2, 3, 4, and 5; and the variable y consists of the values A, B, and C.
Furthermore, we have to install and load the dplyr package:
install.packages("dplyr") # Install and load dplyr library("dplyr")
In the following example, we’ll create a table, representing the relative frequencies / proportions of our example data.
Keep on reading!
Example: Get Relative Frequencies of Data Frame in R
In order to create a frequency table with the dplyr package, we can use a combination of the group_by, summarise, n, mutate, and sum functions. Have a look at the following R syntax:
data %>% # Create tibble with frequencies group_by(x, y) %>% summarise(n = n()) %>% mutate(freq = n / sum(n)) # `summarise()` has grouped output by 'x'. You can override using the `.groups` argument. # # A tibble: 15 x 4 # # Groups: x [5] # x y n freq # <int> <chr> <int> <dbl> # 1 1 A 10 0.476 # 2 1 B 7 0.333 # 3 1 C 4 0.190 # 4 2 A 7 0.438 # 5 2 B 5 0.312 # 6 2 C 4 0.25 # 7 3 A 4 0.286 # 8 3 B 4 0.286 # 9 3 C 6 0.429 # 10 4 A 9 0.474 # 11 4 B 6 0.316 # 12 4 C 4 0.211 # 13 5 A 12 0.4 # 14 5 B 10 0.333 # 15 5 C 8 0.267
As you can see based on the output of the RStudio console, the previous R code returned a tibble containing each possible combination of our two variables x and y as well as the count of each combination and the frequency of each combination. Note that the previous R code is based on this thread on Stack Overflow.
Video & Further Resources
I have recently published a video on my YouTube channel, which explains the contents of this post. You can find the video below:
Please accept YouTube cookies to play this video. By accepting you will be accessing content from YouTube, a service provided by an external third party.
If you accept this notice, your choice will be saved and the page will refresh.
In addition, you might read the other tutorials on statisticsglobe.com. I have created similar articles to the present tutorial on topics such as contingency tables, pivot tables, and lookup tables:
- How to Create Tables in R
- How to Create a Frequency Table in R
- Contingency Table in R
- prop.table Function in R
- How to Create a Pivot Table in R
- Lookup Table in R
- mutate & transmute R Functions of dplyr Package
- sum Function in R
- Introduction to dplyr
- The R Programming Language
In summary: In this article, I illustrated how to summarize categorical variables in a frequency / proportion table with the dplyr package in R programming. If you have additional comments or questions, please let me know in the comments section. Besides that, don’t forget to subscribe to my email newsletter in order to get updates on new articles.
Statistics Globe Newsletter
6 Comments. Leave new
can you make graph with the frequency table?
Hi Gisla,
Thank you for your comment. Could you specify in some more detail how such a graph should look like?
Regards,
Joachim
Hi Joachim,
your script alows to get a printed table in the consol window. I would like that the new columns (n and freq) are added to my initial table (x and y). This way the rsults are stored and can be use for making plots.
How I could do that?
Many thanks in advance.
Bijgom
Hi Bijgom,
Thank you for the comment!
Unfortunately, it is not possible to add n and freq to the initial table, since n and freq have a different length than the number of rows of the initial table (i.e. 15 vs. 100).
However, you may store the results of the frequency table in a new table by adding data_freq <- in front of the R code:
I hope that helps!
Joachim
I have been trying to find a code that outputs the frequencies of multiple columns based on similar factor levels within each one of them to no avail. Say for instance my dataframe is df with 5 columns with three levels say “2 Mins” “3Mins” and “4Mins”. How do we tabulate frequencies and percentages associated with each outcome across all the columns?
Hey Fred,
Should the percentages sum up to 1 in each column or among all columns together?
Could you provide some example data, and a data frame containing the desired output?
Regards
Joachim