R Error: `n()` must only be used inside dplyr verbs. (2 Examples)
In this article, I’ll illustrate how to reproduce and fix the error message “must only be used inside dplyr verbs” in the R programming language.
The post is structured as follows:
Let’s start right away…
Creation of Example Data
The following data is used as basement for this R programming language tutorial:
data <- data.frame(value = 1:9, # Create example data group = letters[1:3]) data # Print example data
Have a look at the table that got returned after executing the previous R code. It shows that our example data is constituted of nine rows and two columns.
Example 1: Reproduce the Error: `n()` must only be used inside dplyr verbs.
In Example 1, I’ll show how to replicate the error “must only be used inside dplyr verbs” in the R programming language.
In order to reproduce this error message, we first have to install and load the dplyr package:
install.packages("dplyr") # Install & load dplyr package library("dplyr")
As next step, we have to install and load the plyr package:
install.packages("plyr") # Install & load plyr library("plyr")
Note that the order in which we load those two packages is key for the replication of the error message “must only be used inside dplyr verbs”.
However, let’s assume that we want to count the number of cases within each group of our example data frame using the group_by, summarize, and n functions.
Then, we might try to execute the following R code:
data %>% # Code leads to error group_by(group) %>% summarize(count = n()) # Error: `n()` must only be used inside dplyr verbs. # Run `rlang::last_error()` to see where the error occurred.
Unfortunately, the RStudio console returns the error message “must only be used inside dplyr verbs”.
The reason for this is that a function called summarize exists in the dplyr AND in the plyr package.
Since we have loaded the plyr package second, the R programming language by default uses the summarize function of the plyr package.
However, the R code that we have used above needs to use the summarize function of the dplyr package, leading to problems when running the group_by and summarize functions in R.
So how can we debug this error message? Keep on reading!
Example 2: Fix the Error: `n()` must only be used inside dplyr verbs.
This example illustrates how to avoid the error message “must only be used inside dplyr verbs”.
As explained before, the cause of this error is R’s confusion which summarize function (dplyr vs. plyr) it should use.
Fortunately, we can tell R explicitly the package that we want to use by specifying the name and :: in front of the function.
Consider the following R code:
data %>% # Code does not lead to error group_by(group) %>% dplyr::summarize(count = n()) # `summarise()` ungrouping output (override with `.groups` argument) # # A tibble: 3 x 2 # group count # <chr> <int> # 1 a 3 # 2 b 3 # 3 c 3
The previous code works fine and runs without any error messages.
Video, Further Resources & Summary
Do you want to learn more about dplyr errors? Then you may watch the following video tutorial of my YouTube channel. I’m explaining the content of this tutorial in the video.
Please accept YouTube cookies to play this video. By accepting you will be accessing content from YouTube, a service provided by an external third party.
If you accept this notice, your choice will be saved and the page will refresh.
In addition, you could have a look at the related R programming tutorials of my homepage.
You have learned in this article how to handle the error message “must only be used inside dplyr verbs” in the R programming language.
However, I have some final notes before we close this tutorial:
In this tutorial, we have illustrated the error message “must only be used inside dplyr verbs” based on the summarize and n functions. However, this error might also occur when using other dplyr functions that are in conflict with other packages.
Furthermore, this error message depends on the version of dplyr and the exact R code that you use. Depending on your situation, you may also see the error message “function should not be called directly”. However, the reason for this error message is also the conflict with other packages.
Tell me about it in the comments, in case you have any additional questions. Furthermore, don’t forget to subscribe to my email newsletter in order to receive updates on the newest articles.
8 Comments. Leave new
Great solution. Thanks a lot to socialize knowledge!
Hey Fabio,
Thanks a lot for the kind comment, glad you found it helpful!
Regards,
Joachim
Thanks for the clear explanation.
I was wondering if you know about %n% function.
The piece of code looks like this:collaterals <- c("Other", "Debenture", "None")
collaterals %n% c("N", "Y", "None")
but the %n% function cannot be found and I'm not sure which package it comes from.
Hello,
Unfortunately, I couldn’t find an answer after some research. But I can barely remember that it refers to “contains”. I can’t test it because the needed package, whichever it is, is not installed in my Rstudio. In other words, the function is undefined for me. One suggestion might be that you post this question on our Facebook discussion group. Someone there could help.
Regards,
Cansu
You saved my day, I must thank you a Ton !! 😀
Hi Charlotte,
Thank you very much for your comment! It’s always great to read, that an article has been helpful for somebody!
Best,
Matthias
Gracias, me sirvió
Thank you very much for your feedback. Glad the article has been helpful for you!