R Error: `n()` must only be used inside dplyr verbs. (2 Examples)

 

In this article, I’ll illustrate how to reproduce and fix the error message “must only be used inside dplyr verbs” in the R programming language.

The post is structured as follows:

Let’s start right away…

 

Creation of Example Data

The following data is used as basement for this R programming language tutorial:

data <- data.frame(value = 1:9,    # Create example data
                   group = letters[1:3])
data                               # Print example data

 

table 1 data frame r error must only be used inside dplyr verbs

 

Have a look at the table that got returned after executing the previous R code. It shows that our example data is constituted of nine rows and two columns.

 

Example 1: Reproduce the Error: `n()` must only be used inside dplyr verbs.

In Example 1, I’ll show how to replicate the error “must only be used inside dplyr verbs” in the R programming language.

In order to reproduce this error message, we first have to install and load the dplyr package:

install.packages("dplyr")          # Install & load dplyr package
library("dplyr")

As next step, we have to install and load the plyr package:

install.packages("plyr")           # Install & load plyr
library("plyr")

Note that the order in which we load those two packages is key for the replication of the error message “must only be used inside dplyr verbs”.

However, let’s assume that we want to count the number of cases within each group of our example data frame using the group_by, summarize, and n functions.

Then, we might try to execute the following R code:

data %>%                           # Code leads to error
  group_by(group) %>% 
  summarize(count = n())
# Error: `n()` must only be used inside dplyr verbs.
# Run `rlang::last_error()` to see where the error occurred.

Unfortunately, the RStudio console returns the error message “must only be used inside dplyr verbs”.

The reason for this is that a function called summarize exists in the dplyr AND in the plyr package.

Since we have loaded the plyr package second, the R programming language by default uses the summarize function of the plyr package.

However, the R code that we have used above needs to use the summarize function of the dplyr package, leading to problems when running the group_by and summarize functions in R.

So how can we debug this error message? Keep on reading!

 

Example 2: Fix the Error: `n()` must only be used inside dplyr verbs.

This example illustrates how to avoid the error message “must only be used inside dplyr verbs”.

As explained before, the cause of this error is R’s confusion which summarize function (dplyr vs. plyr) it should use.

Fortunately, we can tell R explicitly the package that we want to use by specifying the name and :: in front of the function.

Consider the following R code:

data %>%                           # Code does not lead to error
  group_by(group) %>% 
  dplyr::summarize(count = n())
   # `summarise()` ungrouping output (override with `.groups` argument)
# # A tibble: 3 x 2
#   group count
#   <chr> <int>
# 1 a         3
# 2 b         3
# 3 c         3

The previous code works fine and runs without any error messages.

 

Video, Further Resources & Summary

Do you want to learn more about dplyr errors? Then you may watch the following video tutorial of my YouTube channel. I’m explaining the content of this tutorial in the video.

 

 

In addition, you could have a look at the related R programming tutorials of my homepage.

 

You have learned in this article how to handle the error message “must only be used inside dplyr verbs” in the R programming language.

However, I have some final notes before we close this tutorial:

In this tutorial, we have illustrated the error message “must only be used inside dplyr verbs” based on the summarize and n functions. However, this error might also occur when using other dplyr functions that are in conflict with other packages.

Furthermore, this error message depends on the version of dplyr and the exact R code that you use. Depending on your situation, you may also see the error message “function should not be called directly”. However, the reason for this error message is also the conflict with other packages.

Tell me about it in the comments, in case you have any additional questions. Furthermore, don’t forget to subscribe to my email newsletter in order to receive updates on the newest articles.

 

Subscribe to the Statistics Globe Newsletter

Get regular updates on the latest tutorials, offers & news at Statistics Globe.
I hate spam & you may opt out anytime: Privacy Policy.


8 Comments. Leave new

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.

Top