case_when & cases Functions in R (2 Examples)
This article illustrates how to apply the case_when and cases functions in R.
Table of contents:
You’re here for the answer, so let’s get straight to the examples.
Creation of Exemplifying Data
As a first step, let’s create some example data. In this example, we’ll use the numeric vector x1…
x1 <- 1:6 # Create first vector x1 # Print first vector # 1 2 3 4 5 6
…and the character vector x2 as basement:
x2 <- letters[1:6] # Create second vector x2 # Print second vector # "a" "b" "c" "d" "e" "f"
Next, I’ll explain how to create a new vector based on logical conditions that involve the example vectors x1 and x2.
In other words: I’m showing R programming functions that are equivalent to the popular CASE WHEN SQL statement.
Example 1: Distinguish between Cases Using case_when() Function of dplyr Package
In this example, I’ll explain how to apply the cases_when function of the dplyr package to conditionally create a new vector in R.
First, we have to install and load the dplyr package:
install.packages("dplyr") # Install & load dplyr package library("dplyr")
Now, we can apply the case_when function as shown below:
new_dplyr <- case_when(x1 < 3 ~ "Group 1", # Applying case_when x2 %in% letters[2:5] ~ "Group 2", TRUE ~ "Group 3") new_dplyr # Print output # [1] "Group 1" "Group 1" "Group 2" "Group 2" "Group 2" "Group 3"
Have a look at the previous output of the RStudio console: It shows that our new vector contains three different groups that were assigned depending on the logical conditions that we have specified within the case_when function.
Note that some cases multiple conditions are TRUE. For instance, the second vector elements of our two input vectors (i.e. x1 = 2 and x2 = “b”) are TRUE in all three logical conditions.
In such a case, the case_when function automatically assigns the first output (i.e. “Group 1”) to the new vector.
Anyway, let’s compare the syntax of the case_when function to the cases function…
Example 2: Distinguish between Cases Using cases() Function of memisc Package
In this example, I’ll illustrate how to apply the cases function of the memisc package.
First, we need to install and load the memisc package to RStudio:
install.packages("memisc") # Install memisc package library("memisc") # Load memisc
Now, we can apply the cases command as shown below:
new_memisc <- cases("Group 1" = x1 < 3, # Applying cases "Group 2" = x2 %in% letters[2:5], "Group 3" = TRUE) # Warning message: # In cases(`Group 1` = x1 < 3, `Group 2` = x2 %in% letters[2:5], `Group 3` = TRUE) : # conditions are not mutually exclusive
The RStudio console returns a warning message after running the previous R code. This is due to the overlap of logical conditions that we have discussed before in Example 1. However, the cases function also automatically uses the first logical condition that is TRUE to define the final output.
Let’s see how the new vector created by the cases function looks like:
new_memisc # Print output # [1] Group 1 Group 1 Group 2 Group 2 Group 2 Group 3 # Levels: Group 1 Group 2 Group 3
As you can see, the group values are exactly the same as in Example 1. However, the cases function returns its output with the factor class (in contrast to the cases_when function that returned character strings).
Video & Further Resources
Do you need more explanations on the examples of this article? Then I can recommend to have a look at the following video that I have published on my YouTube channel. I show the contents of this tutorial in the video:
The YouTube video will be added soon.
Besides that, you might want to read the other tutorials on my website. Some articles about data manipulation in R can be found here.
This tutorial showed how to conditionally specify a new vector in the R programming language. Don’t hesitate to let me know in the comments, in case you have further questions.
Statistics Globe Newsletter
10 Comments. Leave new
When I run the code in R Studio, I got an error:
> new_dplyr <- case_when(x1 < 3 ~ "Group 1", # Applying case_when
+ x2 %in% letters[2:5] ~ "Group 2",
+ TRUE ~ "Group 3")
Error: `x2 %in% letters[2:5] ~ "Group 2"` must be length 6 or one, not 5.
Hey Udayan,
Thank you for the comment.
This is strange. I checked it and for me it works fine.
Could you try to add dplyr:: in front of the case_when function? I.e. dplyr::case_when
If this doesn’t work, could you try to re-install dplyr?
Regards,
Joachim
Sir,I am also facing a similar problem.
I wanted to add a symbol “sector” to identify the stocks
this my data structure
symbol date monthly.returns
JPM 1990-01-01 -0.1416664625
JPM 1990-02-01 -0.0048545287
JPM 1990-03-01 -0.0646765162
JPM 1990-04-01 0.0000000000
JPM 1990-05-01 0.1711227805
JPM 1990-06-01 -0.0456217222
JPM 1990-07-01 -0.0980394562
JPM 1990-08-01 -0.0597825387
JPM 1990-09-01 -0.2612770939
JPM 1990-10-01 -0.3467746505
so i have 55 symbols
symbols <- c("JPM","C","BAC","WFC","USB",
"PFE","ABT","BMY","CVS","JNJ",
"VZ","T","DIS","CMCSA","OMC",
"XOM","CVX","OXY","COP","VLO",
"WELL","WY","LPX","PEAK","PSA",
"NEM","ECL","APD","GWW","CLF",
"PG","KO","COST","WMT","KR",
"TGT","NKE","MCD","HD","CCL",
"DUK","UGI","D","AEP","NEE",
"AAPL","MSFT","IBM","ORCL","ADBE",
"GE","HON","BA","CAT","CMI"
)
and wanted to add a column to identify each of these stocks as per sector
"GE"~ Industrials,
"AAPL" ~ "Tech".
So then how can go forward with case_when
Hi Mandar,
Could you please explain in some more detail what your desired output should look like? I’m afraid I don’t understand what you are trying to do.
Regards,
Joachim
Sir,
I am trying to create a column called sector to identify the respective symbols as per the sectors, like AAPL – Tech, JPM-Financials . So in my previous mail, i showed the returns data.
I am using ifelse
returns_tbl$sector <- ifelse(returns_tbl$symbol %in%c("JPM","C","BAC","WFC","USB"),"financials",
ifelse(returns_tbl$symbol %in%c("PFE","ABT","BMY","CVS","JNJ"),"health-care",
ifelse(returns_tbl$symbol %in%c("VZ","T","DIS","CMCSA","OMC"),"communications",
ifelse(returns_tbl$symbol %in%c("XOM","CVX","OXY","COP","VLO"),"energy",
ifelse(returns_tbl$symbol %in%c("WELL","WY","LPX","PEAK","PSA"),"real_estate","NA")))))
So how should i use case_when in this scenario
Hey Mandar,
Thank you for the clarifications regarding your question. You may use the following code to achieve this:
Regards,
Joachim
Hi Joahim,
I have a data.frame like this:
dtf <- data.frame(x = c("A","B","A","B"), y = c(1,3,5,8))
How can I use 'case_when()' to mutate a new variable recoding the 'y' column with rules some like: if x == "A" than Z = NA, if x == "B" than Z = y? Where y is offcourse the value of y column in the data.frame.
Thank You in advanced
Hello Karol,
There is a debate on StackOverflow that case_when() does not work properly with mutate() in some versions of dplyr. That might be an issue. What about using the following instead?
Regards,
Cansu
Thank You,
Your suggestion about version of the package was completely valid. I’ve updated the ‘tidyverse’ and all works nicely…
By the way – thank for all Your job and helpful videos on YT. I believe it takes a lot of Your time and aford. But they all are interesting and smart… Sometimes as easy as updating packages, but always smart 🙂
Hello Karol,
I am glad that my suggestion helped you 🙂 Thank you for your kind and sweet words, such feedback motivates us to work hard!
Regards,
Cansu