R Error: contrasts can be applied only to factors with 2 or more levels

 

This tutorial illustrates how to handle the error message “contrasts can be applied only to factors with 2 or more levels” in the R programming language.

The article contains this content:

Here’s how to do it…

 

Creation of Exemplifying Data

The following data will be used as basement for this R programming tutorial:

data <- data.frame(x1 = c(1, 4, 3, 1, 5, 5),          # Create example data
                   x2 = c(7, 7, 7, 1, 1, 2),
                   x3 = as.factor(5),
                   y = c(4, 3, 2, 5, 5, 1))
data                                                  # Print example data
#   x1 x2 x3 y
# 1  1  7  5 4
# 2  4  7  5 3
# 3  3  7  5 2
# 4  1  1  5 5
# 5  5  1  5 5
# 6  5  2  5 1

As you can see based on the previous output of the RStudio console, the example data has six rows and four columns. The variable x1, x2, and x3 are our predictors and the variable y is our target variable.

 

Example 1: Reproduce the Error: contrasts can be applied only to factors with 2 or more levels

The following R code shows how to replicate the error message “contrasts can be applied only to factors with 2 or more levels”.

Let’s assume that we want to estimate a linear model of our data using the lm function in R:

lm(y ~ ., data)                                       # Trying to apply lm()
# Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) : 
#   contrasts can be applied only to factors with 2 or more levels

As you can see, the lm function returned the error “contrasts can be applied only to factors with 2 or more levels” to the RStudio console.

The reason for this is that one of our predictor variables has only one factor level.

Have a look at the column x3. As you can see, this column does only consist of the value five.

Please note that this error message might also occur when your data contains NA values, even when all of your factor columns consist of more than one factor level.

The reason for this is that the lm function performs listwise deletion to remove all rows with NA values from your data.

If the retained complete data consists of columns with only one factor level, the error message “contrasts can be applied only to factors with 2 or more levels” appears.

So how can we solve this problem? That’s what I’m going to show next!

 

Example 2: Fix the Error: contrasts can be applied only to factors with 2 or more levels

Example 2 shows how to deal with the error message “contrasts can be applied only to factors with 2 or more levels”.

As explained in Example 1, this error occurs due to one-level factor variables. So the first step is to identify those variables in our data.

We can do that by using the sapply and lapply functions in combination with the unique and length functions:

values_count <- sapply(lapply(data, unique), length)  # Identify variables with 1 value
values_count                                          # Print counts of different values
# x1 x2 x3  y 
#  4  3  1  5

The previous R code returned a named vector showing the number of different values in each of our columns. As you can see, the variable x3 contains the same value in each data cell.

We can now use this vector to subset our data frame within the lm function so that only variables with more than one value are used as predictors:

lm(y ~ ., data[ , values_count > 1])                  # Apply lm() to subset of data
# Call:
# lm(formula = y ~ ., data = data[, values_count > 1])
# 
# Coefficients:
# (Intercept)           x1           x2  
#      5.8534      -0.4788      -0.2409

The lm functions returns a valid output – looks good!

 

Video & Further Resources

Have a look at the following video of my YouTube channel. I’m explaining the content of this article in the video tutorial:

 

 

Furthermore, you may have a look at the other articles of my website. Please find a selection of posts below:

 

In this tutorial you have learned how to deal with the error “contrasts can be applied only to factors with 2 or more levels” in the R programming language. Don’t hesitate to let me know in the comments, if you have additional questions or comments.

 

Subscribe to the Statistics Globe Newsletter

Get regular updates on the latest tutorials, offers & news at Statistics Globe.
I hate spam & you may opt out anytime: Privacy Policy.


8 Comments. Leave new

  • Mayur Dhage
    April 4, 2021 6:12 pm

    Hello Joachim! Excellent explanation as always.
    I am making a logistic regression model in R and having same “error in contrast” thing which is so frustrating. All the factors has more than two levels but still this is error is showing up. Probably the reason is NA values. I confirmed it with “> sapply(train.glm, function(x) if (is.factor(x)) length(levels(x)) else NA)” and it was showing two variables with NA. But while performing “is.na(data)” no NA values were shown.
    Can you please put up a solution when “error in contrast” is due to NA values ?

    Reply
  • hi .. when i ran the valu string, i got that all variables have 2 or more values,, still the error persisst.. is there anything I ma missing

    Reply
  • Hello Joachim! I used the lm function but it still shows the error. I think is about the NAs. I tried all the 6 methods to delete NA, but none of them worked…Do you know why is that?

    Reply
  • Hi Joachim! Hope this message finds you welll.

    I’m facing this issue but I’m kinda confused on how to solve it since I’m using itsa.model. My code is the following:

    semana = c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20)
    depv = c(13247362,10056516,8370314,8528859,10799576,15634141,18045081,15844996,11746581,8317168,6297086,5105649,3640796,2945964,2053475,1676356,1496004,1393066,1252430,1076479)
    interruption = c(0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1)
    cov_mor = c(325,235,188,158,139,178,284,375,544,883,840,785,682,520,516,376,309,266,220,180)
    cov_cas = c(16573,12819,16539,26975,58131,127696,156958,159754,178197,151411,126709,89327,68655,52442,41688,34397,35699,33510,37615,42079)

    x <- as.data.frame(cbind(semana, depv, interruption, cov_mor, cov_cas))

    itsa.model(data=x, time="semana", depvar="depv", interrupt_var = "interruption",
    covariates = "cov_mor","cov_cas", alpha=0.05, bootstrap=TRUE)

    Could you please help with the steps to fix it? Let me know if you need further info. Your help will be much appreciated!

    Best,
    Gabi

    Reply
    • Hey Gabrielle,

      I have just tried to solve this error message, but I wasn’t able to do so. Unfortunately, I’m not an expert on the its.analysis package.

      However, I have recently created a Facebook discussion group where people can ask questions about R programming and statistics. Could you post your question there? This way, others can contribute/read as well: https://www.facebook.com/groups/statisticsglobe

      Regards,
      Joachim

      Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.

Top