Create Dummy Variable in R (3 Examples)

 

This tutorial shows how to generate dummy variables in the R programming language.

The tutorial will consist of the following content blocks:

So let’s just jump right in…

 

Example 1: Convert Character String with Two Values to Dummy Using ifelse() Function

In Example 1, I’ll explain how to convert a character vector (or a factor) that contains two different values to a dummy indicator. Let’s first create such a character vector in R:

vec1 <- c("yes", "no", "no", "yes", "no")               # Create input vector
vec1                                                    # Print input vector
# [1] "yes" "no"  "no"  "yes" "no"

The previous RStudio console output shows the structure of our example vector. It consists of five character strings that are either “yes” or “no”.

We can now convert this input vector to a numeric dummy indicator using the ifelse function:

dummy1 <- ifelse(vec1 == "yes", 1, 0)                   # Applying ifelse function
dummy1                                                  # Print dummy
# [1] 1 0 0 1 0

Our dummy vector is equal to 1 in case the input vector was equal to “yes”; and equal to 0 in case the input vector was equal to “no”.

 

Example 2: Convert Categorical Variable to Dummy Matrix Using model.matrix() Function

Example 2 explains how to create a dummy matrix based on an input vector with multiple values (i.e. a categorical variable). Let’s create another example vector in R:

vec2 <- c("yes", "no", "maybe", "yes", "yes", "maybe")  # Create input vector
vec2                                                    # Print input vector
# [1] "yes"   "no"    "maybe" "yes"   "yes"   "maybe"

Our example vector consists of six character strings that are either “yes”, “no”, or “maybe”.

We can convert this vector to a dummy matrix using the model.matrix function as shown below. Note that we are also using the as.data.frame function, since this makes the output a bit prettier and easier to read (in my opinion).

dummy2 <- as.data.frame(model.matrix(~ vec2 - 1))       # Applying model.matrix function
dummy2                                                  # Print dummy
#   vec2maybe vec2no vec2yes
# 1         0      0       1
# 2         0      1       0
# 3         1      0       0
# 4         0      0       1
# 5         0      0       1
# 6         1      0       0

Have a look at the previous output of the RStudio console. Our input vector was converted to a data frame consisting of three dummy indicators that correspond to the three different values of our input vector.

 

Example 3: Generate Random Dummy Vector Using rbinom() Function

It is also possible to generate random binomial dummy indicators using the rbinom function.

The following R code generates a dummy that is equal to 1 in 30% of the cases and equal to 0 in 70% of the cases:

set.seed(9376562)                                       # Set random seed
dummy3 <- rbinom(n = 10, size = 1, prob = 0.3)          # Applying rbinom function
dummy3                                                  # Print dummy
# [1] 1 0 0 1 0 1 0 1 0 0

 

Video, Further Resources & Summary

Do you need more info on the R code of this tutorial? Then I can recommend to watch the following video of the Statistics Globe YouTube channel. I explain the R programming codes of the present article in the video:

 

Please accept YouTube cookies to play this video. By accepting you will be accessing content from YouTube, a service provided by an external third party.

YouTube Content Consent Button Thumbnail

YouTube privacy policy

If you accept this notice, your choice will be saved and the page will refresh.

 

In addition, you might want to have a look at the related articles that I have published on https://www.statisticsglobe.com/:

 

You learned in this tutorial how to make a dummy in the R programming language. If you have additional questions, please let me know in the comments. In addition, don’t forget to subscribe to my email newsletter for updates on new tutorials.

 

Subscribe to the Statistics Globe Newsletter

Get regular updates on the latest tutorials, offers & news at Statistics Globe.
I hate spam & you may opt out anytime: Privacy Policy.


2 Comments. Leave new

  • Hello, nice post.
    But how could to create a dummy among two dates.
    For example, We have a time series vector from 01/01/2000 to 30/12/2020 (daily, moth or quarter, is possible too) and we are interested to capture a date 15/02/2008-17/02/2008.
    Thanks for yours post.

    Reply
    • Hi John,

      Thank you for the nice feedback!

      Regarding your question:

      You can specify any logical condition within the ifelse function. In your case, you may use the following R code:

      ifelse(time_series %in% c("15/02/2008", "16/02/2008", "17/02/2008"), 1, 0)

      I hope that helps!

      Joachim

      Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.

Menu
Top