Convert Factor to Dummy Indicator Variables for Every Level in R (Example)

 

This page explains how to expand a factor column to dummy variables for each factor level in the R programming language.

The content of the tutorial is structured as follows:

Here’s the step-by-step process:

 

Creation of Example Data

In the example of this R programming tutorial, we’ll use the following data frame in R:

data <- data.frame(x1 = c("a", "b", "a", "XXX", "C", "b", "abc"),   # Create example data
                   x2 = 1,
                   x3 = 2)
data                                                                # Print example data
#    x1 x2 x3
# 1   a  1  2
# 2   b  1  2
# 3   a  1  2
# 4 XXX  1  2
# 5   C  1  2
# 6   b  1  2
# 7 abc  1  2

Our example data consists of seven rows and three columns. The first column, i.e. the variable x1, is a factor with five different factor levels.

 

Example: Converting Factor to 1/0 Dummy Indicator

If we want to expand our data frame so that every factor level of x1 is represented in a dummy column, we can use the model.matrix function as shown below:

model.matrix( ~ x1 - 1, data)                                       # Convert to dummies
#   x1a x1abc x1b x1C x1XXX
# 1   1     0   0   0     0
# 2   0     0   1   0     0
# 3   1     0   0   0     0
# 4   0     0   0   0     1
# 5   0     0   0   1     0
# 6   0     0   1   0     0
# 7   0     1   0   0     0
# attr(,"assign")
# [1] 1 1 1 1 1
# attr(,"contrasts")
# attr(,"contrasts")$x1
# [1] "contr.treatment"

As you can see based on the output of the RStudio console, the output of the previous R syntax is a dummy matrix representing our factor variable x1.

If we want to merge these dummies to our original data frame, we can use the following R programming code:

data_dummy <- data.frame(data[ , ! colnames(data) %in% "x1"],       # Create dummy data
                         model.matrix( ~ x1 - 1, data))
data_dummy                                                          # Print dummy data
#   x2 x3 x1a x1abc x1b x1C x1XXX
# 1  1  2   1     0   0   0     0
# 2  1  2   0     0   1   0     0
# 3  1  2   1     0   0   0     0
# 4  1  2   0     0   0   0     1
# 5  1  2   0     0   0   1     0
# 6  1  2   0     0   1   0     0
# 7  1  2   0     1   0   0     0

The final output consists of the variables of our original data frame (except x1) plus the dummy variables that are reflecting the factor levels of x1.

 

Video, Further Resources & Summary

Do you need further info on the R codes of the present article? Then you might watch the following video of my YouTube channel. I’m explaining the contents of this page in the video:

 

Please accept YouTube cookies to play this video. By accepting you will be accessing content from YouTube, a service provided by an external third party.

YouTube Content Consent Button Thumbnail

YouTube privacy policy

If you accept this notice, your choice will be saved and the page will refresh.

 

Furthermore, you could read the related posts on my homepage:

 

At this point of the article you should have learned how to automatically expand a factor column into dummies in R programming. Tell me about it in the comments, if you have additional questions. In addition, please subscribe to my email newsletter to get updates on new posts.

 



Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.

Menu
Top