R Error: contrasts can be applied only to factors with 2 or more levels
This tutorial illustrates how to handle the error message “contrasts can be applied only to factors with 2 or more levels” in the R programming language.
The article contains this content:
Here’s how to do it…
Creation of Exemplifying Data
The following data will be used as basement for this R programming tutorial:
data <- data.frame(x1 = c(1, 4, 3, 1, 5, 5), # Create example data x2 = c(7, 7, 7, 1, 1, 2), x3 = as.factor(5), y = c(4, 3, 2, 5, 5, 1)) data # Print example data # x1 x2 x3 y # 1 1 7 5 4 # 2 4 7 5 3 # 3 3 7 5 2 # 4 1 1 5 5 # 5 5 1 5 5 # 6 5 2 5 1
As you can see based on the previous output of the RStudio console, the example data has six rows and four columns. The variable x1, x2, and x3 are our predictors and the variable y is our target variable.
Example 1: Reproduce the Error: contrasts can be applied only to factors with 2 or more levels
The following R code shows how to replicate the error message “contrasts can be applied only to factors with 2 or more levels”.
Let’s assume that we want to estimate a linear model of our data using the lm function in R:
lm(y ~ ., data) # Trying to apply lm() # Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) : # contrasts can be applied only to factors with 2 or more levels
As you can see, the lm function returned the error “contrasts can be applied only to factors with 2 or more levels” to the RStudio console.
The reason for this is that one of our predictor variables has only one factor level.
Have a look at the column x3. As you can see, this column does only consist of the value five.
Please note that this error message might also occur when your data contains NA values, even when all of your factor columns consist of more than one factor level.
The reason for this is that the lm function performs listwise deletion to remove all rows with NA values from your data.
If the retained complete data consists of columns with only one factor level, the error message “contrasts can be applied only to factors with 2 or more levels” appears.
So how can we solve this problem? That’s what I’m going to show next!
Example 2: Fix the Error: contrasts can be applied only to factors with 2 or more levels
Example 2 shows how to deal with the error message “contrasts can be applied only to factors with 2 or more levels”.
As explained in Example 1, this error occurs due to one-level factor variables. So the first step is to identify those variables in our data.
We can do that by using the sapply and lapply functions in combination with the unique and length functions:
values_count <- sapply(lapply(data, unique), length) # Identify variables with 1 value values_count # Print counts of different values # x1 x2 x3 y # 4 3 1 5
The previous R code returned a named vector showing the number of different values in each of our columns. As you can see, the variable x3 contains the same value in each data cell.
We can now use this vector to subset our data frame within the lm function so that only variables with more than one value are used as predictors:
lm(y ~ ., data[ , values_count > 1]) # Apply lm() to subset of data # Call: # lm(formula = y ~ ., data = data[, values_count > 1]) # # Coefficients: # (Intercept) x1 x2 # 5.8534 -0.4788 -0.2409
The lm functions returns a valid output – looks good!
Video & Further Resources
Have a look at the following video of my YouTube channel. I’m explaining the content of this article in the video tutorial:
Please accept YouTube cookies to play this video. By accepting you will be accessing content from YouTube, a service provided by an external third party.
If you accept this notice, your choice will be saved and the page will refresh.
Furthermore, you may have a look at the other articles of my website. Please find a selection of posts below:
In this tutorial you have learned how to deal with the error “contrasts can be applied only to factors with 2 or more levels” in the R programming language. Don’t hesitate to let me know in the comments, if you have additional questions or comments.
8 Comments. Leave new
Hello Joachim! Excellent explanation as always.
I am making a logistic regression model in R and having same “error in contrast” thing which is so frustrating. All the factors has more than two levels but still this is error is showing up. Probably the reason is NA values. I confirmed it with “> sapply(train.glm, function(x) if (is.factor(x)) length(levels(x)) else NA)” and it was showing two variables with NA. But while performing “is.na(data)” no NA values were shown.
Can you please put up a solution when “error in contrast” is due to NA values ?
Hi Mayur,
Thank you for the very kind words!
I recommend creating a subset of your data, where all rows with NA values have been removed. You can learn how to do that here: https://statisticsglobe.com/r-remove-data-frame-rows-with-some-or-all-na
Afterwards, you can use the code shown in this tutorial to identify variables with only one factor level (i.e. Example 2).
I hope that helps!
Joachim
hi .. when i ran the valu string, i got that all variables have 2 or more values,, still the error persisst.. is there anything I ma missing
Hey Swati,
Could you share the code you have used?
Regards,
Joachim
Hello Joachim! I used the lm function but it still shows the error. I think is about the NAs. I tried all the 6 methods to delete NA, but none of them worked…Do you know why is that?
Hey Guan,
Could you share the code you have used?
Regards,
Joachim
Hi Joachim! Hope this message finds you welll.
I’m facing this issue but I’m kinda confused on how to solve it since I’m using itsa.model. My code is the following:
semana = c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20)
depv = c(13247362,10056516,8370314,8528859,10799576,15634141,18045081,15844996,11746581,8317168,6297086,5105649,3640796,2945964,2053475,1676356,1496004,1393066,1252430,1076479)
interruption = c(0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1)
cov_mor = c(325,235,188,158,139,178,284,375,544,883,840,785,682,520,516,376,309,266,220,180)
cov_cas = c(16573,12819,16539,26975,58131,127696,156958,159754,178197,151411,126709,89327,68655,52442,41688,34397,35699,33510,37615,42079)
x <- as.data.frame(cbind(semana, depv, interruption, cov_mor, cov_cas))
itsa.model(data=x, time="semana", depvar="depv", interrupt_var = "interruption",
covariates = "cov_mor","cov_cas", alpha=0.05, bootstrap=TRUE)
Could you please help with the steps to fix it? Let me know if you need further info. Your help will be much appreciated!
Best,
Gabi
Hey Gabrielle,
I have just tried to solve this error message, but I wasn’t able to do so. Unfortunately, I’m not an expert on the its.analysis package.
However, I have recently created a Facebook discussion group where people can ask questions about R programming and statistics. Could you post your question there? This way, others can contribute/read as well: https://www.facebook.com/groups/statisticsglobe
Regards,
Joachim