Convert Factor to Dummy Indicator Variables for Every Level in R (Example)

 

This page explains how to expand a factor column to dummy variables for each factor level in the R programming language.

The content of the tutorial is structured as follows:

Here’s the step-by-step process:

 

Creation of Example Data

In the example of this R programming tutorial, we’ll use the following data frame in R:

data <- data.frame(x1 = c("a", "b", "a", "XXX", "C", "b", "abc"),   # Create example data
                   x2 = 1,
                   x3 = 2)
data                                                                # Print example data
#    x1 x2 x3
# 1   a  1  2
# 2   b  1  2
# 3   a  1  2
# 4 XXX  1  2
# 5   C  1  2
# 6   b  1  2
# 7 abc  1  2

Our example data consists of seven rows and three columns. The first column, i.e. the variable x1, is a factor with five different factor levels.

 

Example: Converting Factor to 1/0 Dummy Indicator

If we want to expand our data frame so that every factor level of x1 is represented in a dummy column, we can use the model.matrix function as shown below:

model.matrix( ~ x1 - 1, data)                                       # Convert to dummies
#   x1a x1abc x1b x1C x1XXX
# 1   1     0   0   0     0
# 2   0     0   1   0     0
# 3   1     0   0   0     0
# 4   0     0   0   0     1
# 5   0     0   0   1     0
# 6   0     0   1   0     0
# 7   0     1   0   0     0
# attr(,"assign")
# [1] 1 1 1 1 1
# attr(,"contrasts")
# attr(,"contrasts")$x1
# [1] "contr.treatment"

As you can see based on the output of the RStudio console, the output of the previous R syntax is a dummy matrix representing our factor variable x1.

If we want to merge these dummies to our original data frame, we can use the following R programming code:

data_dummy <- data.frame(data[ , ! colnames(data) %in% "x1"],       # Create dummy data
                         model.matrix( ~ x1 - 1, data))
data_dummy                                                          # Print dummy data
#   x2 x3 x1a x1abc x1b x1C x1XXX
# 1  1  2   1     0   0   0     0
# 2  1  2   0     0   1   0     0
# 3  1  2   1     0   0   0     0
# 4  1  2   0     0   0   0     1
# 5  1  2   0     0   0   1     0
# 6  1  2   0     0   1   0     0
# 7  1  2   0     1   0   0     0

The final output consists of the variables of our original data frame (except x1) plus the dummy variables that are reflecting the factor levels of x1.

 

Video, Further Resources & Summary

Do you need further info on the R codes of the present article? Then you might watch the following video of my YouTube channel. I’m explaining the contents of this page in the video:

 

The YouTube video will be added soon.

 

Furthermore, you could read the related posts on my homepage:

 

At this point of the article you should have learned how to automatically expand a factor column into dummies in R programming. Tell me about it in the comments, if you have additional questions. In addition, please subscribe to my email newsletter to get updates on new posts.

 



Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.

Menu
Top