*This error occurs when trying to fit a linear regression model in R using the lm() function but either the predictor or response variables contain Not a Number (NaN) or infinity (Inf) values. *

*You can solve this error by replacing the NaN and Inf values with NA values, for example:*

`df[is.na(df) | df=="Inf"] = NA`

*The error can also occur if you do not provide a continuous numeric response variable when performing linear regression, for example, Yes/No. In that case, you may have your predictor and response variables the wrong way round, or you may need to fit a logistic regression model instead.*

*This tutorial will go through the error in detail and how to solve it with code examples.*

## Table of contents

## Example #1: Predictor and/or Response Variable contains NaN or Inf

Consider the following data frame that contains information about the amount of ice cream sold over ten days and the temperature for each of the days in Celsius.

df <- data.frame(day=c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10), temperature=c(10, NA, 22, 19, 28, 15, 20, 17, 13, 30), ice_cream_sold=c(5, NaN, 40, 38, 100, 40, Inf, 10, 30, 150)) df

day temperature ice_cream_sold 1 1 10 5 2 2 NA NaN 3 3 22 40 4 4 19 38 5 5 28 100 6 6 15 40 7 7 20 Inf 8 8 17 10 9 9 13 30 10 10 30 150

The data frame contains some NaN and Inf values.

Let’s attempt to fit a linear regression model using temperature as the predictor variable and ice_cream_sold as the response variable.

model <- lm(ice_cream_sold ~ temperature, df)

Let’s run the code to see what happens:

Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) : NA/NaN/Inf in 'y'

The R interpreter raises an error because there are NaN and Inf values present in the data frame.

### Solution

We can solve this error by replacing the NaN and Inf values with NA. The NA values are ignored when fitting the linear regression model. Let’s look at the revised code:

df[is.na(df) | df=="Inf"] = NA df

Let’s look at the updated data frame:

day temperature ice_cream_sold 1 1 10 5 2 2 NA NA 3 3 22 40 4 4 19 38 5 5 28 100 6 6 15 40 7 7 20 NA 8 8 17 10 9 9 13 30 10 10 30 150

Now we can fit the linear regression model and get the coefficients of the model using summary():

model <- lm(ice_cream_sold ~ temperature, df) summary(model)

Call: lm(formula = ice_cream_sold ~ temperature, data = df) Residuals: Min 1Q Median 3Q Max -28.73 -15.96 2.43 15.42 31.50 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -68.127 25.983 -2.622 0.03948 * temperature 6.221 1.277 4.872 0.00279 ** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 23.8 on 6 degrees of freedom (2 observations deleted due to missingness) Multiple R-squared: 0.7982, Adjusted R-squared: 0.7646 F-statistic: 23.73 on 1 and 6 DF, p-value: 0.00279

## Example #2

Consider an example where the ice cream data frame contains a new categorical column indicating whether a given day was cloudy or not.

df <- data.frame(day=c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10), temperature=c(10, 40, 22, 19, 28, 15, 20, 17, 13, 30), ice_cream_sold=c(5, 200, 40, 38, 100, 40, 55, 10, 30, 150), is_cloudy = c('Yes', 'No', 'No', 'Yes', 'No', 'No', 'No', 'Yes', 'Yes', 'No') ) df

day temperature ice_cream_sold is_cloudy 1 1 10 5 Yes 2 2 40 200 No 3 3 22 40 No 4 4 19 38 Yes 5 5 28 100 No 6 6 15 40 No 7 7 20 55 No 8 8 17 10 Yes 9 9 13 30 Yes 10 10 30 150 No

We want to fit a model using is_cloudy as the response variable and ice_cream_sold as the predictor variable:

model <- lm(is_cloudy ~ ice_cream_sold, df)

Let’s run the code to see the result:

Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) : NA/NaN/Inf in 'y' In addition: Warning message: In storage.mode(v) <- "double" : NAs introduced by coercion

The error occurs because we have a categorical response variable. We can only use continuous numerical values for our response variable in linear regression.

### Solution #1: Use Logistic Regression

As the response variable can only have two outcomes (binary), we can perform logistic regression using the generalized linear model function glm(). We have to specify the parameter family=binomial().

model <- glm(as.factor(is_cloudy) ~ ice_cream_sold, data = df, family=binomial()) summary(model)

Note that we have to tell the R to treat is_cloudy as a factor otherwise it will treat it like a numeric variable. Let’s run the code to get the coefficients of the model:

Call: glm(formula = as.factor(is_cloudy) ~ ice_cream_sold, family = binomial(), data = df) Deviance Residuals: Min 1Q Median 3Q Max -7.672e-05 -2.100e-08 -2.100e-08 2.100e-08 1.046e-04 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 753.70 222362.45 0.003 0.997 ice_cream_sold -19.33 5694.43 -0.003 0.997 (Dispersion parameter for binomial family taken to be 1) Null deviance: 1.346e+01 on 9 degrees of freedom Residual deviance: 2.272e-08 on 8 degrees of freedom AIC: 4 Number of Fisher Scoring iterations: 25

We successfully fit a logistic regression model.

### Solution #2: Swap the variables

Alternatively, the predictor and response variables may be the wrong way round. The variable `ice_cream_sold`

is the outcome and the variable `is_cloudy`

is the predictor. Let’s look at the revised code:

model <- lm(ice_cream_sold ~ is_cloudy, df) summary(model)

Let’s run the code to fit the linear regression model and get the model coefficients.

Call: lm(formula = ice_cream_sold ~ is_cloudy, data = df) Residuals: Min 1Q Median 3Q Max -57.500 -35.812 -4.125 15.250 102.500 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 97.50 21.62 4.510 0.00198 ** is_cloudyYes -76.75 34.18 -2.245 0.05497 . --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 52.96 on 8 degrees of freedom Multiple R-squared: 0.3866, Adjusted R-squared: 0.3099 F-statistic: 5.041 on 1 and 8 DF, p-value: 0.05497

## Summary

Congratulations on reading to the end of this tutorial!

For further reading on R related errors, go to the articles:

- How to Solve R Error: $ operator is invalid for atomic vectors
- How to Solve R Error in apply: dim(X) must have a positive length
- How to Solve R Error in eval(predvars, data, env): object not found
- How to Solve R Error: list object cannot be coerced to type double

Go to the online courses page on R to learn more about coding in R for data science and machine learning.

Have fun and happy researching!