Create Dummy Variable in R (3 Examples)
This tutorial shows how to generate dummy variables in the R programming language.
The tutorial will consist of the following content blocks:
So let’s just jump right in…
Example 1: Convert Character String with Two Values to Dummy Using ifelse() Function
In Example 1, I’ll explain how to convert a character vector (or a factor) that contains two different values to a dummy indicator. Let’s first create such a character vector in R:
vec1 <- c("yes", "no", "no", "yes", "no") # Create input vector vec1 # Print input vector # [1] "yes" "no" "no" "yes" "no"
The previous RStudio console output shows the structure of our example vector. It consists of five character strings that are either “yes” or “no”.
We can now convert this input vector to a numeric dummy indicator using the ifelse function:
dummy1 <- ifelse(vec1 == "yes", 1, 0) # Applying ifelse function dummy1 # Print dummy # [1] 1 0 0 1 0
Our dummy vector is equal to 1 in case the input vector was equal to “yes”; and equal to 0 in case the input vector was equal to “no”.
Example 2: Convert Categorical Variable to Dummy Matrix Using model.matrix() Function
Example 2 explains how to create a dummy matrix based on an input vector with multiple values (i.e. a categorical variable). Let’s create another example vector in R:
vec2 <- c("yes", "no", "maybe", "yes", "yes", "maybe") # Create input vector vec2 # Print input vector # [1] "yes" "no" "maybe" "yes" "yes" "maybe"
Our example vector consists of six character strings that are either “yes”, “no”, or “maybe”.
We can convert this vector to a dummy matrix using the model.matrix function as shown below. Note that we are also using the as.data.frame function, since this makes the output a bit prettier and easier to read (in my opinion).
dummy2 <- as.data.frame(model.matrix(~ vec2 - 1)) # Applying model.matrix function dummy2 # Print dummy # vec2maybe vec2no vec2yes # 1 0 0 1 # 2 0 1 0 # 3 1 0 0 # 4 0 0 1 # 5 0 0 1 # 6 1 0 0
Have a look at the previous output of the RStudio console. Our input vector was converted to a data frame consisting of three dummy indicators that correspond to the three different values of our input vector.
Example 3: Generate Random Dummy Vector Using rbinom() Function
It is also possible to generate random binomial dummy indicators using the rbinom function.
The following R code generates a dummy that is equal to 1 in 30% of the cases and equal to 0 in 70% of the cases:
set.seed(9376562) # Set random seed dummy3 <- rbinom(n = 10, size = 1, prob = 0.3) # Applying rbinom function dummy3 # Print dummy # [1] 1 0 0 1 0 1 0 1 0 0
Video, Further Resources & Summary
Do you need more info on the R code of this tutorial? Then I can recommend watching the following video of the Statistics Globe YouTube channel. I explain the R programming codes of the present article in the video:
Please accept YouTube cookies to play this video. By accepting you will be accessing content from YouTube, a service provided by an external third party.
If you accept this notice, your choice will be saved and the page will refresh.
In addition, you might want to have a look at the related articles that I have published on https://www.statisticsglobe.com/:
You learned in this tutorial how to make a dummy in the R programming language – an approach that is often used when building statistical models or for one-hot encoding in case of machine learning applications.
If you have additional questions, please let me know in the comments. In addition, don’t forget to subscribe to my email newsletter for updates on new tutorials.
Statistics Globe Newsletter
12 Comments. Leave new
Hello, nice post.
But how could to create a dummy among two dates.
For example, We have a time series vector from 01/01/2000 to 30/12/2020 (daily, moth or quarter, is possible too) and we are interested to capture a date 15/02/2008-17/02/2008.
Thanks for yours post.
Hi John,
Thank you for the nice feedback!
Regarding your question:
You can specify any logical condition within the ifelse function. In your case, you may use the following R code:
I hope that helps!
Joachim
Joachim,
How do you take a categorical variable in an existing dataframe, convert to multiple dummies and overwrite back into the dataframe?
Hey William,
Could you explain what you mean with “overwrite back into the dataframe”?
Regards,
Joachim
Hi, how do we create a dummy for multiple categories of a single variable? thanks
Hey Oshin,
Did you have a look at Example 2? I think it explains your question.
Regards,
Joachim
Hi thank you for nice tutorial!
What if we already hasdummy in ourdata and we got error in lm?
Hey,
Thank you, glad you find the tutorial useful!
Could you share your code and the error message?
Regards,
Joachim
I would like to create a dummy variable for a categorical variable with numerical values. This categorical variable is called “schtype” and has the values of 1,2,3,4. 1 is public, 2 is catholic, 3 is private religious, 4 is private. How would I go about creating a dummy variable.
Hi Gemma,
You would have to convert your numeric data to the character class first. Have a look at the following example:
Regards,
Joachim
I would like to create a dummy variable if the “TokenID” appears once in the dataset and another dummy if it appears more than once. (In other words, sold for the first time or several times). How would the code look like?
Thanks in advance
Hey Carolina,
Thank you for the kind comment, glad you like the tutorial!
Please excuse the late response. I was on a long holiday so unfortunately I wasn’t able to reply sooner. Still need help with your code?
Regards,
Joachim