case_when & cases Functions in R (2 Examples)
This article illustrates how to apply the case_when and cases functions in R.
Table of contents:
You’re here for the answer, so let’s get straight to the examples.
Creation of Exemplifying Data
As a first step, let’s create some example data. In this example, we’ll use the numeric vector x1…
x1 <- 1:6 # Create first vector x1 # Print first vector # 1 2 3 4 5 6 |
x1 <- 1:6 # Create first vector x1 # Print first vector # 1 2 3 4 5 6
…and the character vector x2 as basement:
x2 <- letters[1:6] # Create second vector x2 # Print second vector # "a" "b" "c" "d" "e" "f" |
x2 <- letters[1:6] # Create second vector x2 # Print second vector # "a" "b" "c" "d" "e" "f"
Next, I’ll explain how to create a new vector based on logical conditions that involve the example vectors x1 and x2.
In other words: I’m showing R programming functions that are equivalent to the popular CASE WHEN SQL statement.
Example 1: Distinguish between Cases Using case_when() Function of dplyr Package
In this example, I’ll explain how to apply the cases_when function of the dplyr package to conditionally create a new vector in R.
First, we have to install and load the dplyr package:
install.packages("dplyr") # Install & load dplyr package library("dplyr") |
install.packages("dplyr") # Install & load dplyr package library("dplyr")
Now, we can apply the case_when function as shown below:
new_dplyr <- case_when(x1 < 3 ~ "Group 1", # Applying case_when x2 %in% letters[2:5] ~ "Group 2", TRUE ~ "Group 3") new_dplyr # Print output # [1] "Group 1" "Group 1" "Group 2" "Group 2" "Group 2" "Group 3" |
new_dplyr <- case_when(x1 < 3 ~ "Group 1", # Applying case_when x2 %in% letters[2:5] ~ "Group 2", TRUE ~ "Group 3") new_dplyr # Print output # [1] "Group 1" "Group 1" "Group 2" "Group 2" "Group 2" "Group 3"
Have a look at the previous output of the RStudio console: It shows that our new vector contains three different groups that were assigned depending on the logical conditions that we have specified within the case_when function.
Note that some cases multiple conditions are TRUE. For instance, the second vector elements of our two input vectors (i.e. x1 = 2 and x2 = “b”) are TRUE in all three logical conditions.
In such a case, the case_when function automatically assigns the first output (i.e. “Group 1”) to the new vector.
Anyway, let’s compare the syntax of the case_when function to the cases function…
Example 2: Distinguish between Cases Using cases() Function of memisc Package
In this example, I’ll illustrate how to apply the cases function of the memisc package.
First, we need to install and load the memisc package to RStudio:
install.packages("memisc") # Install memisc package library("memisc") # Load memisc |
install.packages("memisc") # Install memisc package library("memisc") # Load memisc
Now, we can apply the cases command as shown below:
new_memisc <- cases("Group 1" = x1 < 3, # Applying cases "Group 2" = x2 %in% letters[2:5], "Group 3" = TRUE) # Warning message: # In cases(`Group 1` = x1 < 3, `Group 2` = x2 %in% letters[2:5], `Group 3` = TRUE) : # conditions are not mutually exclusive |
new_memisc <- cases("Group 1" = x1 < 3, # Applying cases "Group 2" = x2 %in% letters[2:5], "Group 3" = TRUE) # Warning message: # In cases(`Group 1` = x1 < 3, `Group 2` = x2 %in% letters[2:5], `Group 3` = TRUE) : # conditions are not mutually exclusive
The RStudio console returns a warning message after running the previous R code. This is due to the overlap of logical conditions that we have discussed before in Example 1. However, the cases function also automatically uses the first logical condition that is TRUE to define the final output.
Let’s see how the new vector created by the cases function looks like:
new_memisc # Print output # [1] Group 1 Group 1 Group 2 Group 2 Group 2 Group 3 # Levels: Group 1 Group 2 Group 3 |
new_memisc # Print output # [1] Group 1 Group 1 Group 2 Group 2 Group 2 Group 3 # Levels: Group 1 Group 2 Group 3
As you can see, the group values are exactly the same as in Example 1. However, the cases function returns its output with the factor class (in contrast to the cases_when function that returned character strings).
Video & Further Resources
Do you need more explanations on the examples of this article? Then I can recommend to have a look at the following video that I have published on my YouTube channel. I show the contents of this tutorial in the video:
The YouTube video will be added soon.
Besides that, you might want to read the other tutorials on my website. Some articles about data manipulation in R can be found here.
This tutorial showed how to conditionally specify a new vector in the R programming language. Don’t hesitate to let me know in the comments, in case you have further questions.
Statistics Globe Newsletter
2 Comments. Leave new
When I run the code in R Studio, I got an error:
> new_dplyr <- case_when(x1 < 3 ~ "Group 1", # Applying case_when
+ x2 %in% letters[2:5] ~ "Group 2",
+ TRUE ~ "Group 3")
Error: `x2 %in% letters[2:5] ~ "Group 2"` must be length 6 or one, not 5.
Hey Udayan,
Thank you for the comment.
This is strange. I checked it and for me it works fine.
Could you try to add dplyr:: in front of the case_when function? I.e. dplyr::case_when
If this doesn’t work, could you try to re-install dplyr?
Regards,
Joachim