Create Dummy Variable in R (3 Examples)

 

This tutorial shows how to generate dummy variables in the R programming language.

The tutorial will consist of the following content blocks:

So let’s just jump right in…

 

Example 1: Convert Character String with Two Values to Dummy Using ifelse() Function

In Example 1, I’ll explain how to convert a character vector (or a factor) that contains two different values to a dummy indicator. Let’s first create such a character vector in R:

vec1 <- c("yes", "no", "no", "yes", "no")               # Create input vector
vec1                                                    # Print input vector
# [1] "yes" "no"  "no"  "yes" "no"

The previous RStudio console output shows the structure of our example vector. It consists of five character strings that are either “yes” or “no”.

We can now convert this input vector to a numeric dummy indicator using the ifelse function:

dummy1 <- ifelse(vec1 == "yes", 1, 0)                   # Applying ifelse function
dummy1                                                  # Print dummy
# [1] 1 0 0 1 0

Our dummy vector is equal to 1 in case the input vector was equal to “yes”; and equal to 0 in case the input vector was equal to “no”.

 

Example 2: Convert Categorical Variable to Dummy Matrix Using model.matrix() Function

Example 2 explains how to create a dummy matrix based on an input vector with multiple values (i.e. a categorical variable). Let’s create another example vector in R:

vec2 <- c("yes", "no", "maybe", "yes", "yes", "maybe")  # Create input vector
vec2                                                    # Print input vector
# [1] "yes"   "no"    "maybe" "yes"   "yes"   "maybe"

Our example vector consists of six character strings that are either “yes”, “no”, or “maybe”.

We can convert this vector to a dummy matrix using the model.matrix function as shown below. Note that we are also using the as.data.frame function, since this makes the output a bit prettier and easier to read (in my opinion).

dummy2 <- as.data.frame(model.matrix(~ vec2 - 1))       # Applying model.matrix function
dummy2                                                  # Print dummy
#   vec2maybe vec2no vec2yes
# 1         0      0       1
# 2         0      1       0
# 3         1      0       0
# 4         0      0       1
# 5         0      0       1
# 6         1      0       0

Have a look at the previous output of the RStudio console. Our input vector was converted to a data frame consisting of three dummy indicators that correspond to the three different values of our input vector.

 

Example 3: Generate Random Dummy Vector Using rbinom() Function

It is also possible to generate random binomial dummy indicators using the rbinom function.

The following R code generates a dummy that is equal to 1 in 30% of the cases and equal to 0 in 70% of the cases:

set.seed(9376562)                                       # Set random seed
dummy3 <- rbinom(n = 10, size = 1, prob = 0.3)          # Applying rbinom function
dummy3                                                  # Print dummy
# [1] 1 0 0 1 0 1 0 1 0 0

 

Video, Further Resources & Summary

Do you need more info on the R code of this tutorial? Then I can recommend watching the following video of the Statistics Globe YouTube channel. I explain the R programming codes of the present article in the video:

 

 

In addition, you might want to have a look at the related articles that I have published on https://www.statisticsglobe.com/:

 

You learned in this tutorial how to make a dummy in the R programming language – an approach that is often used when building statistical models or for one-hot encoding in case of machine learning applications.

If you have additional questions, please let me know in the comments. In addition, don’t forget to subscribe to my email newsletter for updates on new tutorials.

 

Subscribe to the Statistics Globe Newsletter

Get regular updates on the latest tutorials, offers & news at Statistics Globe.
I hate spam & you may opt out anytime: Privacy Policy.


12 Comments. Leave new

  • Hello, nice post.
    But how could to create a dummy among two dates.
    For example, We have a time series vector from 01/01/2000 to 30/12/2020 (daily, moth or quarter, is possible too) and we are interested to capture a date 15/02/2008-17/02/2008.
    Thanks for yours post.

    Reply
    • Hi John,

      Thank you for the nice feedback!

      Regarding your question:

      You can specify any logical condition within the ifelse function. In your case, you may use the following R code:

      ifelse(time_series %in% c("15/02/2008", "16/02/2008", "17/02/2008"), 1, 0)

      I hope that helps!

      Joachim

      Reply
  • Joachim,
    How do you take a categorical variable in an existing dataframe, convert to multiple dummies and overwrite back into the dataframe?

    Reply
  • Hi, how do we create a dummy for multiple categories of a single variable? thanks

    Reply
  • Hi thank you for nice tutorial!
    What if we already hasdummy in ourdata and we got error in lm?

    Reply
  • I would like to create a dummy variable for a categorical variable with numerical values. This categorical variable is called “schtype” and has the values of 1,2,3,4. 1 is public, 2 is catholic, 3 is private religious, 4 is private. How would I go about creating a dummy variable.

    Reply
    • Hi Gemma,

      You would have to convert your numeric data to the character class first. Have a look at the following example:

      x <- as.character(1:4)
      as.data.frame(model.matrix(~ x - 1))
      #   x1 x2 x3 x4
      # 1  1  0  0  0
      # 2  0  1  0  0
      # 3  0  0  1  0
      # 4  0  0  0  1

      Regards,
      Joachim

      Reply
  • I would like to create a dummy variable if the “TokenID” appears once in the dataset and another dummy if it appears more than once. (In other words, sold for the first time or several times). How would the code look like?
    Thanks in advance

    Reply
    • Hey Carolina,

      Thank you for the kind comment, glad you like the tutorial!

      Please excuse the late response. I was on a long holiday so unfortunately I wasn’t able to reply sooner. Still need help with your code?

      Regards,
      Joachim

      Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.

Top