R Error in lm.fit(x, y, offset, singular.ok, …) : NA/NaN/Inf in ‘x’ (2 Examples)
In this R tutorial you’ll learn how to deal with the “Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, …) : NA/NaN/Inf in ‘x'”.
The tutorial is structured as follows:
Let’s get started!
Example 1: Data Contains NA, Inf & NaN
The first step is to construct some data that we can use in the following example:
set.seed(52389374) # Create example data data <- data.frame(y = rnorm(100), x = c(NA, Inf, NaN, rnorm(97))) head(data) # Head of example data |
set.seed(52389374) # Create example data data <- data.frame(y = rnorm(100), x = c(NA, Inf, NaN, rnorm(97))) head(data) # Head of example data
As you can see based on Table 1, our example data is a data frame consisting of 100 rows and two columns.
Based on these data, we can replicate the “Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, …) : NA/NaN/Inf in ‘x'” in the R programming language.
Let’s assume that we want to estimate a linear model based on our data. Then, we typically would apply the lm function as shown below:
lm(y ~ x, data) # Try to apply lm function # Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) : # NA/NaN/Inf in 'x' |
lm(y ~ x, data) # Try to apply lm function # Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) : # NA/NaN/Inf in 'x'
Unfortunately, the RStudio console returns the message “Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, …) : NA/NaN/Inf in ‘x'”.
The reason for this is that our data contains NaN and Inf values. In contrast to NA values, these kinds of values cannot be handled by the lm function.
So how can we solve this problem?
To achieve this, we have to replace the NaN and Inf values in our data frame:
data_new <- data # Duplicate data data_new[is.na(data_new) | data_new == "Inf"] <- NA # Replace NaN & Inf with NA |
data_new <- data # Duplicate data data_new[is.na(data_new) | data_new == "Inf"] <- NA # Replace NaN & Inf with NA
The previous R programming syntax has created a new data frame called data_new that does contain NA values instead of NaN and Inf.
Now, we can apply the lm function to this new data frame:
lm(y ~ x, data_new) # Properly apply lm function # Call: # lm(formula = y ~ x, data = data_new) # # Coefficients: # (Intercept) x # -0.043774 -0.001974 |
lm(y ~ x, data_new) # Properly apply lm function # Call: # lm(formula = y ~ x, data = data_new) # # Coefficients: # (Intercept) x # -0.043774 -0.001974
Works fine!
Example 2: Wrong Target Variable in Linear Regression Model
Another reason why the error message “Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, …) : NA/NaN/Inf in ‘x'” occurs is that the target and predictor variables in the lm() function are not specified properly.
Let’s create another example data frame to illustrate that in practice:
set.seed(3334568) # Create example data data2 <- data.frame(x = LETTERS[1:3], y = runif(90)) head(data2) # Head of example data # x y # 1 A 0.47224122 # 2 B 0.14032087 # 3 C 0.15323529 # 4 A 0.08266449 # 5 B 0.10149550 # 6 C 0.68558516 |
set.seed(3334568) # Create example data data2 <- data.frame(x = LETTERS[1:3], y = runif(90)) head(data2) # Head of example data # x y # 1 A 0.47224122 # 2 B 0.14032087 # 3 C 0.15323529 # 4 A 0.08266449 # 5 B 0.10149550 # 6 C 0.68558516
Our data frame contains two variables. The variable y is our outcome, and the variable x is our predictor.
Let’s try to estimate a linear regression model:
my_mod1 <- lm(x ~ y, data2) # Try to estimate model # Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) : # NA/NaN/Inf in 'y' # In addition: Warning message: # In storage.mode(v) <- "double" : NAs introduced by coercion |
my_mod1 <- lm(x ~ y, data2) # Try to estimate model # Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) : # NA/NaN/Inf in 'y' # In addition: Warning message: # In storage.mode(v) <- "double" : NAs introduced by coercion
As you can see, the previous R code has returned an error and a warning message.
The reason for this is that we have specified our dependent and independent variables in the wrong order, i.e. we have tried to use the character variable x as target variable.
Let’s fix this:
my_mod2 <- lm(y ~ x, data2) # Properly estimate model |
my_mod2 <- lm(y ~ x, data2) # Properly estimate model
The previous R code has specified y as the target variable (i.e. on the left side of the ~). This works fine without any error messages.
Video & Further Resources
Do you need more information on the R programming codes of this tutorial? Then I recommend watching the following video of my YouTube channel. In the video, I show the R code of this article in RStudio.
The YouTube video will be added soon.
Furthermore, you may read the related tutorials which I have published on my website. You can find a selection of articles on topics such as vectors, coding errors, and missing data below:
- Replace Inf with NA in Vector & Data Frame
- Help – Error in if (NA) { : missing value where TRUE/FALSE needed
- Errors & Warnings in R
- R Programming Tutorials
In this R programming tutorial you have learned how to get rid of the “Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, …) : NA/NaN/Inf in ‘x'”. Let me know in the comments below, if you have further questions. Furthermore, please subscribe to my email newsletter in order to get updates on new tutorials.
10 Comments. Leave new
I have applied this technique but was unsuccessful. My dataset as below. There is no NA/NaN/Inf value.
‘data.frame’: 300 obs. of 6 variables:
$ X : int 1 2 3 4 5 6 7 8 9 10 …
$ Age : num 49.5 42.3 59.4 46.2 26.5 …
$ Sex : chr “M” “F” “M” “F” …
$ Height : num 178 165 172 166 164 …
$ ReactionTime : num 501 415 644 371 508 …
$ AGE_GROUP : chr “40-70” “40-70” “40-70” “40-70” …
lm(AGE_GROUP ~ ReactionTime, dataset)
Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, …) : NA/NaN/Inf in ‘y’
Any help would be much appreciated. Thanks.
Hey Kafil,
Thanks a lot for your question. Indeed, this reason for the error message has not been covered in the tutorial yet.
I have restructured the tutorial, to include your scenario as well. Please have a look at Example 2 for more details.
In short: I think you have specified your variables in the wrong order (i.e. AGE_GROUP and ReactionTime are exchanged).
Have a look at the reproducible example below:
I hope that helps!
Joachim
My target column has no NA’s but still I am getting the above error and I have selected the correct target variable. please suggest some solution.
Hey Preeti,
Could you illustrate the structure of your data and share your code?
Regards,
Joachim
Hi Joachim,
I had this same warning but my situation doesn’t seem to fit either of your examples.
My code was
reg1_log <- lm(log(sales_MSK) ~ log(price_MSK),
data = da_msk)
my data looks like this :
week store month price_MSK price_LT price_SP store_prom leaflet_ad store_prom_SP leaflet_ad_SP store_prom_LT sales_MSK
1 1 1 2 17.57 19.26 14.54 0 0 0 0 1 2
2 2 1 2 17.57 19.26 14.54 0 0 0 0 0 3
3 3 1 2 17.57 19.26 14.54 0 0 0 0 0 4
4 4 1 2 17.57 19.26 15.29 0 1 0 0 0 6
5 5 1 3 17.57 19.26 15.29 0 0 0 0 0 2
6 6 1 3 17.57 19.26 15.29 1 1 0 0 0 10
7 7 1 3 17.57 19.26 15.29 0 0 0 0 0 3
8 8 1 3 17.57 19.26 15.29 0 0 0 0 0 5
9 9 1 4 17.57 19.26 15.29 0 0 0 0 0 2
10 10 1 4 17.57 19.26 15.29 0 0 0 0 0 3
Hi Jean,
I have tried to reproduce your problem, but with your provided data it works fine for me. Have a look at the example code below:
Have you done anything different as I did?
Regards,
Joachim
Hi Joachim,
Thanks so much for you kind reply. I figured out in the end that in some years the sales was zero and that caused the problem. I ended up use sales+1 instead of sales to solve the problem.
Thanks again for getting back to me 🙂
Ah I see, glad you found a solution! 🙂
Regards,
Joachim
Hi Joachim,
I’m running the following for two datasets, one works fine, but the other I get the error message:
Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, …) :
NA/NaN/Inf in ‘x
Code:
Group_of_Models %
ungroup()%>%
select(-Biomass, -Dataset, -Month, -Year, -DOY, -Station_Number, -Depth)%>%
map(~lm(log(dataset$Biomass) ~ .x, data = dataset))
Both datasets do not contain Inf values, only NaNs, however, this error does not appear for one of the datasets. Even when I remove the NaNs, the models that contain log(.x) are ignored and not run. I’ve honestly tried pretty much every solution I could find. Not sure what else to do, so any help is greatly appreciated! It’s important to note that this same code was running just fine three weeks ago. I’ve also tried running in different computers and the same message appears.
Best!
Hey Lívia,
Please excuse the late response. I was on a long holiday so unfortunately I wasn’t able to reply sooner. Still need help with your code?
Regards,
Joachim