R Error in model.frame.default(Terms, newdata, na.action = na.action, xlev = object$xlevels) : factor X has new levels Y
In this tutorial, I’ll explain how to reproduce and fix the “Error in model.frame.default(Terms, newdata, na.action = na.action, xlev = object$xlevels) : factor X has new levels Y” in the R programming language.
Table of contents:
Let’s dig in…
Creation of Example Data
At the start, let’s construct some example train data for our linear regression model:
set.seed(54136278) # Set random seed data_train <- data.frame(x = letters[1:3], # Create train data set y = rnorm(9)) data_train # Print train data set |
set.seed(54136278) # Set random seed data_train <- data.frame(x = letters[1:3], # Create train data set y = rnorm(9)) data_train # Print train data set
Table 1 shows the structure of our example data: It consists of nine rows and two columns. The variable x is a character that will be used as a predictor variable. The variable y is numerical and will be used as the target variable.
Let’s conduct a linear regression based on our data:
my_mod <- lm(y ~ x, data_train) # Estimate linear regression model summary(my_mod) # Summary statistics of regression model # Call: # lm(formula = y ~ x, data = data_train) # # Residuals: # Min 1Q Median 3Q Max # -0.8830 -0.4090 -0.2373 0.4574 0.8066 # # Coefficients: # Estimate Std. Error t value Pr(>|t|) # (Intercept) 0.23076 0.40321 0.572 0.588 # xb 0.03699 0.57022 0.065 0.950 # xc -0.92006 0.57022 -1.614 0.158 # # Residual standard error: 0.6984 on 6 degrees of freedom # Multiple R-squared: 0.3761, Adjusted R-squared: 0.1681 # F-statistic: 1.808 on 2 and 6 DF, p-value: 0.2429 |
my_mod <- lm(y ~ x, data_train) # Estimate linear regression model summary(my_mod) # Summary statistics of regression model # Call: # lm(formula = y ~ x, data = data_train) # # Residuals: # Min 1Q Median 3Q Max # -0.8830 -0.4090 -0.2373 0.4574 0.8066 # # Coefficients: # Estimate Std. Error t value Pr(>|t|) # (Intercept) 0.23076 0.40321 0.572 0.588 # xb 0.03699 0.57022 0.065 0.950 # xc -0.92006 0.57022 -1.614 0.158 # # Residual standard error: 0.6984 on 6 degrees of freedom # Multiple R-squared: 0.3761, Adjusted R-squared: 0.1681 # F-statistic: 1.808 on 2 and 6 DF, p-value: 0.2429
Looks good. Next, I’ll show why the error message “Error in model.frame.default(Terms, newdata, na.action = na.action, xlev = object$xlevels) : factor X has new levels Y” occurrs.
Example 1: Reproduce the Error in model.frame.default – factor x has new levels
This example shows how to replicate the “Error in model.frame.default(Terms, newdata, na.action = na.action, xlev = object$xlevels) : factor X has new levels Y”.
First, we have to create a test data set:
data_test <- data.frame(x = letters[1:4]) # Create test data set data_test # Print test data set |
data_test <- data.frame(x = letters[1:4]) # Create test data set data_test # Print test data set
By executing the previously shown R programming syntax, we have constructed Table 2, i.e. a data frame containing only our predictor variable x.
Now, let’s assume that we would like to apply the predict function to our test data to return some predictions:
predict(my_mod, data_test) # predict() function returns error message # Error in model.frame.default(Terms, newdata, na.action = na.action, xlev = object$xlevels) : # factor x has new levels d |
predict(my_mod, data_test) # predict() function returns error message # Error in model.frame.default(Terms, newdata, na.action = na.action, xlev = object$xlevels) : # factor x has new levels d
Damn – The “Error in model.frame.default(Terms, newdata, na.action = na.action, xlev = object$xlevels) : factor X has new levels Y” was returned.
This happened because our train data contained one level less in the predictor variable x. More precisely, the train data contained the levels a, b, and c. However, our test data also contains the level d.
So, what should we do now? This is what I’m going to show next!
Example 2: Debug the Error in model.frame.default – factor x has new levels
First of all, we should check why this difference in the train and test data occurred? Is there a logical explanation why one of the levels in the predictor variable is missing in our test data set? The best solution is to find that reason and modify the data accordingly.
However, in case there’s no way to change your data based on logical reasoning, there’s one hard programming solution you might consider.
The following R code sets all observations in our test data set to NA that contain the additional level that didn’t exist in our train data:
data_test_new <- data_test # Duplicate test data set data_test_new$x[which(!(data_test_new$x %in% unique(data_train$x)))] <- NA # Replace new levels by NA data_test_new # Print updated test data set |
data_test_new <- data_test # Duplicate test data set data_test_new$x[which(!(data_test_new$x %in% unique(data_train$x)))] <- NA # Replace new levels by NA data_test_new # Print updated test data set
Table 3 shows the output of the previously shown code – As you can see, we have replaced the character d by NA.
In the next step, we can apply the predict function to our updated test data frame:
predict(my_mod, data_test_new) # Apply predict without errors # 1 2 3 4 # 0.2307644 0.2677586 -0.6892992 NA |
predict(my_mod, data_test_new) # Apply predict without errors # 1 2 3 4 # 0.2307644 0.2677586 -0.6892992 NA
The predictions for those cases with the additional level in the predictor variable are also NA. However, this time, it worked without any error messages.
Video & Further Resources
Some time ago, I have published a video on the Statistics Globe YouTube channel, which demonstrates the R codes of this page. You can find the video below:
The YouTube video will be added soon.
Furthermore, you might read the related R programming articles on this website.
- How to Fix the Error in R : object not interpretable as a factor
- Get All Factor Levels of Vector & Data Frame Column
- Subset Data Frame Rows Based On Factor Levels
- droplevels R Example
- Dealing with Warnings & Errors in R (Cheat Sheet)
- All R Programming Tutorials
At this point of the article you should know how to deal with the “Error in model.frame.default(Terms, newdata, na.action = na.action, xlev = object$xlevels) : factor X has new levels Y” in R programming. Please let me know in the comments section, if you have additional questions and/or comments.