This error occurs when trying to fit a linear regression model in R using the lm() function but either the predictor or response variables contain Not a Number (NaN) or infinity (Inf) values.
You can solve this error by replacing the NaN and Inf values with NA values, for example:
df[is.na(df) | df=="Inf"] = NA
The error can also occur if you do not provide a continuous numeric response variable when performing linear regression, for example, Yes/No. In that case, you may have your predictor and response variables the wrong way round, or you may need to fit a logistic regression model instead.
This tutorial will go through the error in detail and how to solve it with code examples.
Table of contents
Example #1: Predictor and/or Response Variable contains NaN or Inf
Consider the following data frame that contains information about the amount of ice cream sold over ten days and the temperature for each of the days in Celsius.
df <- data.frame(day=c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10), temperature=c(10, NA, 22, 19, 28, 15, 20, 17, 13, 30), ice_cream_sold=c(5, NaN, 40, 38, 100, 40, Inf, 10, 30, 150)) df
day temperature ice_cream_sold 1 1 10 5 2 2 NA NaN 3 3 22 40 4 4 19 38 5 5 28 100 6 6 15 40 7 7 20 Inf 8 8 17 10 9 9 13 30 10 10 30 150
The data frame contains some NaN and Inf values.
Let’s attempt to fit a linear regression model using temperature as the predictor variable and ice_cream_sold as the response variable.
model <- lm(ice_cream_sold ~ temperature, df)
Let’s run the code to see what happens:
Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) : NA/NaN/Inf in 'y'
The R interpreter raises an error because there are NaN and Inf values present in the data frame.
Solution
We can solve this error by replacing the NaN and Inf values with NA. The NA values are ignored when fitting the linear regression model. Let’s look at the revised code:
df[is.na(df) | df=="Inf"] = NA df
Let’s look at the updated data frame:
day temperature ice_cream_sold 1 1 10 5 2 2 NA NA 3 3 22 40 4 4 19 38 5 5 28 100 6 6 15 40 7 7 20 NA 8 8 17 10 9 9 13 30 10 10 30 150
Now we can fit the linear regression model and get the coefficients of the model using summary():
model <- lm(ice_cream_sold ~ temperature, df) summary(model)
Call: lm(formula = ice_cream_sold ~ temperature, data = df) Residuals: Min 1Q Median 3Q Max -28.73 -15.96 2.43 15.42 31.50 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -68.127 25.983 -2.622 0.03948 * temperature 6.221 1.277 4.872 0.00279 ** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 23.8 on 6 degrees of freedom (2 observations deleted due to missingness) Multiple R-squared: 0.7982, Adjusted R-squared: 0.7646 F-statistic: 23.73 on 1 and 6 DF, p-value: 0.00279
Example #2
Consider an example where the ice cream data frame contains a new categorical column indicating whether a given day was cloudy or not.
df <- data.frame(day=c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10), temperature=c(10, 40, 22, 19, 28, 15, 20, 17, 13, 30), ice_cream_sold=c(5, 200, 40, 38, 100, 40, 55, 10, 30, 150), is_cloudy = c('Yes', 'No', 'No', 'Yes', 'No', 'No', 'No', 'Yes', 'Yes', 'No') ) df
day temperature ice_cream_sold is_cloudy 1 1 10 5 Yes 2 2 40 200 No 3 3 22 40 No 4 4 19 38 Yes 5 5 28 100 No 6 6 15 40 No 7 7 20 55 No 8 8 17 10 Yes 9 9 13 30 Yes 10 10 30 150 No
We want to fit a model using is_cloudy as the response variable and ice_cream_sold as the predictor variable:
model <- lm(is_cloudy ~ ice_cream_sold, df)
Let’s run the code to see the result:
Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) : NA/NaN/Inf in 'y' In addition: Warning message: In storage.mode(v) <- "double" : NAs introduced by coercion
The error occurs because we have a categorical response variable. We can only use continuous numerical values for our response variable in linear regression.
Solution #1: Use Logistic Regression
As the response variable can only have two outcomes (binary), we can perform logistic regression using the generalized linear model function glm(). We have to specify the parameter family=binomial().
model <- glm(as.factor(is_cloudy) ~ ice_cream_sold, data = df, family=binomial()) summary(model)
Note that we have to tell the R to treat is_cloudy as a factor otherwise it will treat it like a numeric variable. Let’s run the code to get the coefficients of the model:
Call: glm(formula = as.factor(is_cloudy) ~ ice_cream_sold, family = binomial(), data = df) Deviance Residuals: Min 1Q Median 3Q Max -7.672e-05 -2.100e-08 -2.100e-08 2.100e-08 1.046e-04 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 753.70 222362.45 0.003 0.997 ice_cream_sold -19.33 5694.43 -0.003 0.997 (Dispersion parameter for binomial family taken to be 1) Null deviance: 1.346e+01 on 9 degrees of freedom Residual deviance: 2.272e-08 on 8 degrees of freedom AIC: 4 Number of Fisher Scoring iterations: 25
We successfully fit a logistic regression model.
Solution #2: Swap the variables
Alternatively, the predictor and response variables may be the wrong way round. The variable ice_cream_sold
is the outcome and the variable is_cloudy
is the predictor. Let’s look at the revised code:
model <- lm(ice_cream_sold ~ is_cloudy, df) summary(model)
Let’s run the code to fit the linear regression model and get the model coefficients.
Call: lm(formula = ice_cream_sold ~ is_cloudy, data = df) Residuals: Min 1Q Median 3Q Max -57.500 -35.812 -4.125 15.250 102.500 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 97.50 21.62 4.510 0.00198 ** is_cloudyYes -76.75 34.18 -2.245 0.05497 . --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 52.96 on 8 degrees of freedom Multiple R-squared: 0.3866, Adjusted R-squared: 0.3099 F-statistic: 5.041 on 1 and 8 DF, p-value: 0.05497
Summary
Congratulations on reading to the end of this tutorial!
For further reading on R related errors, go to the articles:
- How to Solve R Error: $ operator is invalid for atomic vectors
- How to Solve R Error in apply: dim(X) must have a positive length
- How to Solve R Error in eval(predvars, data, env): object not found
- How to Solve R Error: list object cannot be coerced to type double
Go to the online courses page on R to learn more about coding in R for data science and machine learning.
Have fun and happy researching!
Suf is a senior advisor in data science with deep expertise in Natural Language Processing, Complex Networks, and Anomaly Detection. Formerly a postdoctoral research fellow, he applied advanced physics techniques to tackle real-world, data-heavy industry challenges. Before that, he was a particle physicist at the ATLAS Experiment of the Large Hadron Collider. Now, he’s focused on bringing more fun and curiosity to the world of science and research online.