Rank Functions of dplyr Package in R (row_number, ntile, min_rank, dense_rank, percent_rank & cume_dist)
In this tutorial, I’ll illustrate how to apply the rank functions of the dplyr package in the R programming language. The rank functions of dplyr are row_number, ntile, min_rank, dense_rank, percent_rank, and cume_dist.
The tutorial will consist of six examples, whereby each example explains one of the rank functions. To be more specific, the tutorial will contain the following:
- Example Data
- Example 1: row_number Function
- Example 2: ntile Function
- Example 3: min_rank Function
- Example 4: dense_rank Function
- Example 5: percent_rank Function
- Example 6: cume_dist Function
- Video & Further Resources
If you want to know more about these contents, keep reading:
In this R tutorial, we’ll apply the rank functions of the dplyr add-on package to the following example vector:
x <- c(4, 1, 5, 2, 3, 3) # Create example vector
Our vector contains of six numeric values with a range from 1 to 5, whereby the value 3 appears twice.
Furthermore, we need to install and load the dplyr package to R:
install.packages("dplyr") # Install & load dplyr library("dplyr")
Now, we can move on to the examples.
Example 1: row_number Function
Example 1 explains how to use the row_number function in R. Have a look at the following R code:
row_number(x) # Apply row_number function # 5 1 6 2 3 4
The row_number function returns the ranking of each value of our input vector. Note that the value 3, which is appearing twice, is also ranked. The second 3 is one ranking position higher than the first 3.
Example 2: ntile Function
In the second example, you’ll learn how to apply the ntile function. The ntile function is the only dplyr ranking function, which takes two arguments as input: the input vector (i.e. x) and an integer number (i.e. 3). The integer number is defining the number of groups to split up into.
ntile(x, 3) # Apply ntile function # 3 1 3 1 2 2
The lowest two values of our input vector (i.e. 1 and 2) are assigned to group 1, the value 3 is assigned to group 2, and the highest two values of our input vector (i.e. 4 and 5) are assigned to group 3.
Example 3: min_rank Function
This example illustrates the usage of the min_rank function:
min_rank(x) # Apply min_rank function # 5 1 6 2 3 3
The output of this function is the same as the output of the row_number command of Example 1, but this time doubling values (i.e. 3) lead to the same output value.
Example 4: dense_rank Function
Example 4 shows how to use the dense_rank function in R:
dense_rank(x) # Apply dense_rank function # 4 1 5 2 3 3
The dense_rank function also returns the rank of our input vector to the RStudio console. In contrast to the min_rank function, dense_rank does not increase the rank for each vector element. Even though the value 3 appears twice, the next rank is only one number higher.
Example 5: percent_rank Function
In Example 5 we’ll apply the percent_rank function:
percent_rank(x) # Apply percent_rank function # 0.8 0.0 1.0 0.2 0.4 0.4
This R function converts the input vector into percentage ranks between 0 and 1.
Example 6: cume_dist Function
Finally, we apply the cume_dist function:
cume_dist(x) # Apply cume_dist function # 0.8333333 0.1666667 1.0000000 0.3333333 0.6666667 0.6666667
The cume_dist function is a cumulative distribution function, which returns the proportion of all values less than or equal to the current rank.
Video & Further Resources
Do you need further information on the dplyr package? Then you may want to have a look at the following video of the Statistics Globe YouTube channel. In the video, I’m explaining the dplyr package in some more detail:
Please accept YouTube cookies to play this video. By accepting you will be accessing content from YouTube, a service provided by an external third party.
If you accept this notice, your choice will be saved and the page will refresh.
Furthermore, you might have a look at the related posts of my website.
- sort, order & rank Functions of Base R
- dplyr Package in R
- R Functions List (+ Examples)
- The R Programming Language
In summary: In this R tutorial you learned how to windowed rank functions of the dplyr package. Don’t hesitate to let me know in the comments below, if you have any additional questions. Furthermore, please subscribe to my email newsletter for updates on the newest articles.