case_when & cases Functions in R (2 Examples)

 

This article illustrates how to apply the case_when and cases functions in R.

Table of contents:

You’re here for the answer, so let’s get straight to the examples.

 

Creation of Exemplifying Data

As a first step, let’s create some example data. In this example, we’ll use the numeric vector x1…

x1 <- 1:6                                     # Create first vector
x1                                            # Print first vector
# 1 2 3 4 5 6

…and the character vector x2 as basement:

x2 <- letters[1:6]                            # Create second vector
x2                                            # Print second vector
# "a" "b" "c" "d" "e" "f"

Next, I’ll explain how to create a new vector based on logical conditions that involve the example vectors x1 and x2.

In other words: I’m showing R programming functions that are equivalent to the popular CASE WHEN SQL statement.

 

Example 1: Distinguish between Cases Using case_when() Function of dplyr Package

In this example, I’ll explain how to apply the cases_when function of the dplyr package to conditionally create a new vector in R.

First, we have to install and load the dplyr package:

install.packages("dplyr")                     # Install & load dplyr package
library("dplyr")

Now, we can apply the case_when function as shown below:

new_dplyr <- case_when(x1 < 3 ~ "Group 1",    # Applying case_when
                       x2 %in% letters[2:5] ~ "Group 2",
                       TRUE ~ "Group 3")
new_dplyr                                     # Print output
# [1] "Group 1" "Group 1" "Group 2" "Group 2" "Group 2" "Group 3"

Have a look at the previous output of the RStudio console: It shows that our new vector contains three different groups that were assigned depending on the logical conditions that we have specified within the case_when function.

Note that some cases multiple conditions are TRUE. For instance, the second vector elements of our two input vectors (i.e. x1 = 2 and x2 = “b”) are TRUE in all three logical conditions.

In such a case, the case_when function automatically assigns the first output (i.e. “Group 1”) to the new vector.

Anyway, let’s compare the syntax of the case_when function to the cases function…

 

Example 2: Distinguish between Cases Using cases() Function of memisc Package

In this example, I’ll illustrate how to apply the cases function of the memisc package.

First, we need to install and load the memisc package to RStudio:

install.packages("memisc")                    # Install memisc package
library("memisc")                             # Load memisc

Now, we can apply the cases command as shown below:

new_memisc <- cases("Group 1" = x1 < 3,       # Applying cases
                    "Group 2" = x2 %in% letters[2:5],
                    "Group 3" = TRUE)
# Warning message:
#   In cases(`Group 1` = x1 < 3, `Group 2` = x2 %in% letters[2:5], `Group 3` = TRUE) :
#   conditions are not mutually exclusive

The RStudio console returns a warning message after running the previous R code. This is due to the overlap of logical conditions that we have discussed before in Example 1. However, the cases function also automatically uses the first logical condition that is TRUE to define the final output.

Let’s see how the new vector created by the cases function looks like:

new_memisc                                    # Print output
# [1] Group 1 Group 1 Group 2 Group 2 Group 2 Group 3
# Levels: Group 1 Group 2 Group 3

As you can see, the group values are exactly the same as in Example 1. However, the cases function returns its output with the factor class (in contrast to the cases_when function that returned character strings).

 

Video & Further Resources

Do you need more explanations on the examples of this article? Then I can recommend to have a look at the following video that I have published on my YouTube channel. I show the contents of this tutorial in the video:

 

The YouTube video will be added soon.

 

Besides that, you might want to read the other tutorials on my website. Some articles about data manipulation in R can be found here.

 

This tutorial showed how to conditionally specify a new vector in the R programming language. Don’t hesitate to let me know in the comments, in case you have further questions.

 

Subscribe to the Statistics Globe Newsletter

Get regular updates on the latest tutorials, offers & news at Statistics Globe.
I hate spam & you may opt out anytime: Privacy Policy.


10 Comments. Leave new

  • When I run the code in R Studio, I got an error:
    > new_dplyr <- case_when(x1 < 3 ~ "Group 1", # Applying case_when
    + x2 %in% letters[2:5] ~ "Group 2",
    + TRUE ~ "Group 3")
    Error: `x2 %in% letters[2:5] ~ "Group 2"` must be length 6 or one, not 5.

    Reply
    • Hey Udayan,

      Thank you for the comment.

      This is strange. I checked it and for me it works fine.

      Could you try to add dplyr:: in front of the case_when function? I.e. dplyr::case_when

      If this doesn’t work, could you try to re-install dplyr?

      Regards,

      Joachim

      Reply
  • Sir,I am also facing a similar problem.
    I wanted to add a symbol “sector” to identify the stocks
    this my data structure

    symbol date monthly.returns

    JPM 1990-01-01 -0.1416664625
    JPM 1990-02-01 -0.0048545287
    JPM 1990-03-01 -0.0646765162
    JPM 1990-04-01 0.0000000000
    JPM 1990-05-01 0.1711227805
    JPM 1990-06-01 -0.0456217222
    JPM 1990-07-01 -0.0980394562
    JPM 1990-08-01 -0.0597825387
    JPM 1990-09-01 -0.2612770939
    JPM 1990-10-01 -0.3467746505

    so i have 55 symbols
    symbols <- c("JPM","C","BAC","WFC","USB",
    "PFE","ABT","BMY","CVS","JNJ",
    "VZ","T","DIS","CMCSA","OMC",
    "XOM","CVX","OXY","COP","VLO",
    "WELL","WY","LPX","PEAK","PSA",
    "NEM","ECL","APD","GWW","CLF",
    "PG","KO","COST","WMT","KR",
    "TGT","NKE","MCD","HD","CCL",
    "DUK","UGI","D","AEP","NEE",
    "AAPL","MSFT","IBM","ORCL","ADBE",
    "GE","HON","BA","CAT","CMI"
    )
    and wanted to add a column to identify each of these stocks as per sector
    "GE"~ Industrials,
    "AAPL" ~ "Tech".
    So then how can go forward with case_when

    Reply
    • Hi Mandar,

      Could you please explain in some more detail what your desired output should look like? I’m afraid I don’t understand what you are trying to do.

      Regards,
      Joachim

      Reply
  • Sir,
    I am trying to create a column called sector to identify the respective symbols as per the sectors, like AAPL – Tech, JPM-Financials . So in my previous mail, i showed the returns data.
    I am using ifelse
    returns_tbl$sector <- ifelse(returns_tbl$symbol %in%c("JPM","C","BAC","WFC","USB"),"financials",
    ifelse(returns_tbl$symbol %in%c("PFE","ABT","BMY","CVS","JNJ"),"health-care",
    ifelse(returns_tbl$symbol %in%c("VZ","T","DIS","CMCSA","OMC"),"communications",
    ifelse(returns_tbl$symbol %in%c("XOM","CVX","OXY","COP","VLO"),"energy",
    ifelse(returns_tbl$symbol %in%c("WELL","WY","LPX","PEAK","PSA"),"real_estate","NA")))))

    So how should i use case_when in this scenario

    Reply
    • Hey Mandar,

      Thank you for the clarifications regarding your question. You may use the following code to achieve this:

      returns_tbl$sector <- NA
      returns_tbl$sector[returns_tbl$symbol %in% c("JPM","C","BAC","WFC","USB")] <- "financials"
      returns_tbl$sector[returns_tbl$symbol %in% c("PFE","ABT","BMY","CVS","JNJ")] <- "health-care"
      # ...

      Regards,
      Joachim

      Reply
  • Hi Joahim,
    I have a data.frame like this:
    dtf <- data.frame(x = c("A","B","A","B"), y = c(1,3,5,8))
    How can I use 'case_when()' to mutate a new variable recoding the 'y' column with rules some like: if x == "A" than Z = NA, if x == "B" than Z = y? Where y is offcourse the value of y column in the data.frame.
    Thank You in advanced

    Reply
    • Hello Karol,

      There is a debate on StackOverflow that case_when() does not work properly with mutate() in some versions of dplyr. That might be an issue. What about using the following instead?

      dtf<-dtf%>%mutate(z=ifelse(x=="A",NA,y))
      dtf
      #   x y  z
      # 1 A 1 NA
      # 2 B 3  3
      # 3 A 5 NA
      # 4 B 8  8

      Regards,
      Cansu

      Reply
      • Thank You,
        Your suggestion about version of the package was completely valid. I’ve updated the ‘tidyverse’ and all works nicely…
        By the way – thank for all Your job and helpful videos on YT. I believe it takes a lot of Your time and aford. But they all are interesting and smart… Sometimes as easy as updating packages, but always smart 🙂

        Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.

Top