R Error in model.frame.default(Terms, newdata, na.action = na.action, xlev = object$xlevels) : factor X has new levels Y

 

In this tutorial, I’ll explain how to reproduce and fix the “Error in model.frame.default(Terms, newdata, na.action = na.action, xlev = object$xlevels) : factor X has new levels Y” in the R programming language.

Table of contents:

Let’s dig in…

 

Creation of Example Data

At the start, let’s construct some example train data for our linear regression model:

set.seed(54136278)                                        # Set random seed
data_train <- data.frame(x = letters[1:3],                # Create train data set
                         y = rnorm(9))
data_train                                                # Print train data set

 

table 1 data frame r error model frame default factor x has new levels

 

Table 1 shows the structure of our example data: It consists of nine rows and two columns. The variable x is a character that will be used as a predictor variable. The variable y is numerical and will be used as the target variable.

Let’s conduct a linear regression based on our data:

my_mod <- lm(y ~ x, data_train)                           # Estimate linear regression model
summary(my_mod)                                           # Summary statistics of regression model
# Call:
# lm(formula = y ~ x, data = data_train)
# 
# Residuals:
#     Min      1Q  Median      3Q     Max 
# -0.8830 -0.4090 -0.2373  0.4574  0.8066 
# 
# Coefficients:
#             Estimate Std. Error t value Pr(>|t|)
# (Intercept)  0.23076    0.40321   0.572    0.588
# xb           0.03699    0.57022   0.065    0.950
# xc          -0.92006    0.57022  -1.614    0.158
# 
# Residual standard error: 0.6984 on 6 degrees of freedom
# Multiple R-squared:  0.3761,	Adjusted R-squared:  0.1681 
# F-statistic: 1.808 on 2 and 6 DF,  p-value: 0.2429

Looks good. Next, I’ll show why the error message “Error in model.frame.default(Terms, newdata, na.action = na.action, xlev = object$xlevels) : factor X has new levels Y” occurrs.

 

Example 1: Reproduce the Error in model.frame.default – factor x has new levels

This example shows how to replicate the “Error in model.frame.default(Terms, newdata, na.action = na.action, xlev = object$xlevels) : factor X has new levels Y”.

First, we have to create a test data set:

data_test <- data.frame(x = letters[1:4])                 # Create test data set
data_test                                                 # Print test data set

 

table 2 data frame r error model frame default factor x has new levels

 

By executing the previously shown R programming syntax, we have constructed Table 2, i.e. a data frame containing only our predictor variable x.

Now, let’s assume that we would like to apply the predict function to our test data to return some predictions:

predict(my_mod, data_test)                                # predict() function returns error message
# Error in model.frame.default(Terms, newdata, na.action = na.action, xlev = object$xlevels) : 
#   factor x has new levels d

Damn – The “Error in model.frame.default(Terms, newdata, na.action = na.action, xlev = object$xlevels) : factor X has new levels Y” was returned.

This happened because our train data contained one level less in the predictor variable x. More precisely, the train data contained the levels a, b, and c. However, our test data also contains the level d.

So, what should we do now? This is what I’m going to show next!

 

Example 2: Debug the Error in model.frame.default – factor x has new levels

First of all, we should check why this difference in the train and test data occurred? Is there a logical explanation why one of the levels in the predictor variable is missing in our test data set? The best solution is to find that reason and modify the data accordingly.

However, in case there’s no way to change your data based on logical reasoning, there’s one hard programming solution you might consider.

The following R code sets all observations in our test data set to NA that contain the additional level that didn’t exist in our train data:

data_test_new <- data_test                                # Duplicate test data set
data_test_new$x[which(!(data_test_new$x %in% unique(data_train$x)))] <- NA  # Replace new levels by NA
data_test_new                                             # Print updated test data set

 

table 3 data frame r error model frame default factor x has new levels

 

Table 3 shows the output of the previously shown code – As you can see, we have replaced the character d by NA.

In the next step, we can apply the predict function to our updated test data frame:

predict(my_mod, data_test_new)                            # Apply predict without errors
#          1          2          3          4 
#  0.2307644  0.2677586 -0.6892992         NA

The predictions for those cases with the additional level in the predictor variable are also NA. However, this time, it worked without any error messages.

 

Video & Further Resources

Some time ago, I have published a video on the Statistics Globe YouTube channel, which demonstrates the R codes of this page. You can find the video below:

 

The YouTube video will be added soon.

 

Furthermore, you might read the related R programming articles on this website.

 

At this point of the article you should know how to deal with the “Error in model.frame.default(Terms, newdata, na.action = na.action, xlev = object$xlevels) : factor X has new levels Y” in R programming. Please let me know in the comments section, if you have additional questions and/or comments.

 

Subscribe to the Statistics Globe Newsletter

Get regular updates on the latest tutorials, offers & news at Statistics Globe.
I hate spam & you may opt out anytime: Privacy Policy.


Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.

Top