R Error in lm.fit(x, y, offset, singular.ok, …) : NA/NaN/Inf in ‘x’ (2 Examples)

 

In this R tutorial you’ll learn how to deal with the “Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, …) : NA/NaN/Inf in ‘x'”.

The tutorial is structured as follows:

Let’s get started!

 

Example 1: Data Contains NA, Inf & NaN

The first step is to construct some data that we can use in the following example:

set.seed(52389374)                                   # Create example data
data <- data.frame(y = rnorm(100),
                   x = c(NA, Inf, NaN, rnorm(97)))
head(data)                                           # Head of example data

 

table 1 data frame r error lm fit na nan inf programming language

 

As you can see based on Table 1, our example data is a data frame consisting of 100 rows and two columns.

Based on these data, we can replicate the “Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, …) : NA/NaN/Inf in ‘x'” in the R programming language.

Let’s assume that we want to estimate a linear model based on our data. Then, we typically would apply the lm function as shown below:

lm(y ~ x, data)                                      # Try to apply lm function
# Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) : 
#   NA/NaN/Inf in 'x'

Unfortunately, the RStudio console returns the message “Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, …) : NA/NaN/Inf in ‘x'”.

The reason for this is that our data contains NaN and Inf values. In contrast to NA values, these kinds of values cannot be handled by the lm function.

So how can we solve this problem?

To achieve this, we have to replace the NaN and Inf values in our data frame:

data_new <- data                                     # Duplicate data
data_new[is.na(data_new) | data_new == "Inf"] <- NA  # Replace NaN & Inf with NA

The previous R programming syntax has created a new data frame called data_new that does contain NA values instead of NaN and Inf.

Now, we can apply the lm function to this new data frame:

lm(y ~ x, data_new)                                  # Properly apply lm function
# Call:
# lm(formula = y ~ x, data = data_new)
# 
# Coefficients:
# (Intercept)            x  
#   -0.043774    -0.001974

Works fine!

 

Example 2: Wrong Target Variable in Linear Regression Model

Another reason why the error message “Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, …) : NA/NaN/Inf in ‘x'” occurs is that the target and predictor variables in the lm() function are not specified properly.

Let’s create another example data frame to illustrate that in practice:

set.seed(3334568)                       # Create example data
data2 <- data.frame(x = LETTERS[1:3],
                    y = runif(90))
head(data2)                             # Head of example data
#   x          y
# 1 A 0.47224122
# 2 B 0.14032087
# 3 C 0.15323529
# 4 A 0.08266449
# 5 B 0.10149550
# 6 C 0.68558516

Our data frame contains two variables. The variable y is our outcome, and the variable x is our predictor.

Let’s try to estimate a linear regression model:

my_mod1 <- lm(x ~ y, data2)             # Try to estimate model
# Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) : 
#   NA/NaN/Inf in 'y'
# In addition: Warning message:
# In storage.mode(v) <- "double" : NAs introduced by coercion

As you can see, the previous R code has returned an error and a warning message.

The reason for this is that we have specified our dependent and independent variables in the wrong order, i.e. we have tried to use the character variable x as target variable.

Let’s fix this:

my_mod2 <- lm(y ~ x, data2)             # Properly estimate model

The previous R code has specified y as the target variable (i.e. on the left side of the ~). This works fine without any error messages.

 

Video & Further Resources

Do you need more information on the R programming codes of this tutorial? Then I recommend watching the following video of my YouTube channel. In the video, I show the R code of this article in RStudio.

 

The YouTube video will be added soon.

 

Furthermore, you may read the related tutorials which I have published on my website. You can find a selection of articles on topics such as vectors, coding errors, and missing data below:

 

In this R programming tutorial you have learned how to get rid of the “Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, …) : NA/NaN/Inf in ‘x'”. Let me know in the comments below, if you have further questions. Furthermore, please subscribe to my email newsletter in order to get updates on new tutorials.

 

Subscribe to the Statistics Globe Newsletter

Get regular updates on the latest tutorials, offers & news at Statistics Globe.
I hate spam & you may opt out anytime: Privacy Policy.


4 Comments. Leave new

  • I have applied this technique but was unsuccessful. My dataset as below. There is no NA/NaN/Inf value.

    ‘data.frame’: 300 obs. of 6 variables:
    $ X : int 1 2 3 4 5 6 7 8 9 10 …
    $ Age : num 49.5 42.3 59.4 46.2 26.5 …
    $ Sex : chr “M” “F” “M” “F” …
    $ Height : num 178 165 172 166 164 …
    $ ReactionTime : num 501 415 644 371 508 …
    $ AGE_GROUP : chr “40-70” “40-70” “40-70” “40-70” …

    lm(AGE_GROUP ~ ReactionTime, dataset)
    Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, …) : NA/NaN/Inf in ‘y’

    Any help would be much appreciated. Thanks.

    Reply
    • Hey Kafil,

      Thanks a lot for your question. Indeed, this reason for the error message has not been covered in the tutorial yet.

      I have restructured the tutorial, to include your scenario as well. Please have a look at Example 2 for more details.

      In short: I think you have specified your variables in the wrong order (i.e. AGE_GROUP and ReactionTime are exchanged).

      Have a look at the reproducible example below:

      dataset <- data.frame(AGE_GROUP = sample(c("40-70", "30-40", "20-30"),
                                               100,
                                               replace = TRUE),
                            ReactionTime = runif(100))
       
      lm(ReactionTime ~ AGE_GROUP, dataset)

      I hope that helps!

      Joachim

      Reply
  • My target column has no NA’s but still I am getting the above error and I have selected the correct target variable. please suggest some solution.

    Reply

Leave a Reply

Your email address will not be published.

Fill out this field
Fill out this field
Please enter a valid email address.

Menu
Top