# Create Dummy Variable in R (3 Examples)

This tutorial shows how to generate dummy variables in the R programming language.

The tutorial will consist of the following content blocks:

So let’s just jump right in…

## Example 1: Convert Character String with Two Values to Dummy Using ifelse() Function

In Example 1, I’ll explain how to convert a character vector (or a factor) that contains two different values to a dummy indicator. Let’s first create such a character vector in R:

```vec1 <- c("yes", "no", "no", "yes", "no")               # Create input vector
vec1                                                    # Print input vector
#  "yes" "no"  "no"  "yes" "no"```

The previous RStudio console output shows the structure of our example vector. It consists of five character strings that are either “yes” or “no”.

We can now convert this input vector to a numeric dummy indicator using the ifelse function:

```dummy1 <- ifelse(vec1 == "yes", 1, 0)                   # Applying ifelse function
dummy1                                                  # Print dummy
#  1 0 0 1 0```

Our dummy vector is equal to 1 in case the input vector was equal to “yes”; and equal to 0 in case the input vector was equal to “no”.

## Example 2: Convert Categorical Variable to Dummy Matrix Using model.matrix() Function

Example 2 explains how to create a dummy matrix based on an input vector with multiple values (i.e. a categorical variable). Let’s create another example vector in R:

```vec2 <- c("yes", "no", "maybe", "yes", "yes", "maybe")  # Create input vector
vec2                                                    # Print input vector
#  "yes"   "no"    "maybe" "yes"   "yes"   "maybe"```

Our example vector consists of six character strings that are either “yes”, “no”, or “maybe”.

We can convert this vector to a dummy matrix using the model.matrix function as shown below. Note that we are also using the as.data.frame function, since this makes the output a bit prettier and easier to read (in my opinion).

```dummy2 <- as.data.frame(model.matrix(~ vec2 - 1))       # Applying model.matrix function
dummy2                                                  # Print dummy
#   vec2maybe vec2no vec2yes
# 1         0      0       1
# 2         0      1       0
# 3         1      0       0
# 4         0      0       1
# 5         0      0       1
# 6         1      0       0```

Have a look at the previous output of the RStudio console. Our input vector was converted to a data frame consisting of three dummy indicators that correspond to the three different values of our input vector.

## Example 3: Generate Random Dummy Vector Using rbinom() Function

It is also possible to generate random binomial dummy indicators using the rbinom function.

The following R code generates a dummy that is equal to 1 in 30% of the cases and equal to 0 in 70% of the cases:

```set.seed(9376562)                                       # Set random seed
dummy3 <- rbinom(n = 10, size = 1, prob = 0.3)          # Applying rbinom function
dummy3                                                  # Print dummy
#  1 0 0 1 0 1 0 1 0 0```

## Video, Further Resources & Summary

Do you need more info on the R code of this tutorial? Then I can recommend watching the following video of the Statistics Globe YouTube channel. I explain the R programming codes of the present article in the video:

Please accept YouTube cookies to play this video. By accepting you will be accessing content from YouTube, a service provided by an external third party. If you accept this notice, your choice will be saved and the page will refresh.

In addition, you might want to have a look at the related articles that I have published on https://www.statisticsglobe.com/:

You learned in this tutorial how to make a dummy in the R programming language – an approach that is often used when building statistical models or for one-hot encoding in case of machine learning applications.

Subscribe to the Statistics Globe Newsletter

• John
February 24, 2021 9:54 pm

Hello, nice post.
But how could to create a dummy among two dates.
For example, We have a time series vector from 01/01/2000 to 30/12/2020 (daily, moth or quarter, is possible too) and we are interested to capture a date 15/02/2008-17/02/2008.
Thanks for yours post.

• February 25, 2021 6:12 am

Hi John,

Thank you for the nice feedback!

You can specify any logical condition within the ifelse function. In your case, you may use the following R code:

`ifelse(time_series %in% c("15/02/2008", "16/02/2008", "17/02/2008"), 1, 0)`

I hope that helps!

Joachim

• william
February 13, 2022 10:55 pm

Joachim,
How do you take a categorical variable in an existing dataframe, convert to multiple dummies and overwrite back into the dataframe?

• February 14, 2022 8:37 am

Hey William,

Could you explain what you mean with “overwrite back into the dataframe”?

Regards,
Joachim

• oshin
August 20, 2022 10:21 pm

Hi, how do we create a dummy for multiple categories of a single variable? thanks

• August 22, 2022 9:01 am

Hey Oshin,

Did you have a look at Example 2? I think it explains your question.

Regards,
Joachim

• R
September 6, 2022 8:55 am

Hi thank you for nice tutorial!
What if we already hasdummy in ourdata and we got error in lm?

• September 6, 2022 9:01 am

Hey,

Thank you, glad you find the tutorial useful!

Could you share your code and the error message?

Regards,
Joachim

• Gemma
September 30, 2022 3:32 am

I would like to create a dummy variable for a categorical variable with numerical values. This categorical variable is called “schtype” and has the values of 1,2,3,4. 1 is public, 2 is catholic, 3 is private religious, 4 is private. How would I go about creating a dummy variable.

• September 30, 2022 5:53 pm

Hi Gemma,

You would have to convert your numeric data to the character class first. Have a look at the following example:

```x <- as.character(1:4)
as.data.frame(model.matrix(~ x - 1))
#   x1 x2 x3 x4
# 1  1  0  0  0
# 2  0  1  0  0
# 3  0  0  1  0
# 4  0  0  0  1```

Regards,
Joachim

• Carolina R.
October 12, 2022 7:13 am

I would like to create a dummy variable if the “TokenID” appears once in the dataset and another dummy if it appears more than once. (In other words, sold for the first time or several times). How would the code look like?

• November 14, 2022 12:30 pm