*This error occurs when you attempt to fit a model and two or more predictor variables are perfectly correlated.*

*You can solve this error by using the cor() function to identify the variables with a perfect correlation and drop one of the variables from the model.*

*This tutorial will go through the error in detail and how to solve it with code examples*

## Example

Let’s look at an example of fitting a linear regression model using a data frame. First, we will define a data frame containing the weight in kilograms and height in metres and centimetres of 10 subjects.

df <- data.frame(weight=c(74, 58, 96, 75, 102, 86, 47, 93, 69, 52), height_m =c(1.7, 1.5, 2.0, 1.75, 1.84, 1.9, 1.3, 1.5, 1.7, 1.66), height_cm=c(170, 150, 200, 175, 184, 190, 130, 150, 170, 166)) summary(df)

Let’s look at an example of fitting a linear regression model using a data frame. First, we will define the data frame.

weight height_m height_cm Min. : 47.00 Min. :1.300 Min. :130.0 1st Qu.: 60.75 1st Qu.:1.540 1st Qu.:154.0 Median : 74.50 Median :1.700 Median :170.0 Mean : 75.20 Mean :1.685 Mean :168.5 3rd Qu.: 91.25 3rd Qu.:1.817 3rd Qu.:181.8 Max. :102.00 Max. :2.000 Max. :200.0

Next, we will fit a linear regression model on the data and print the model summary to the console:

model <- lm(weight~height_m+height_cm, data=df) summary(model)

Call: lm(formula = weight ~ height_m + height_cm, data = df) Residuals: Min 1Q Median 3Q Max -21.6525 -5.4040 -3.3657 0.4445 29.2511 Coefficients: (1 not defined because of singularities) Estimate Std. Error t value Pr(>|t|) (Intercept) -29.10 40.15 -0.725 0.4893 height_m 61.90 23.67 2.616 0.0309 * height_cm NA NA NA NA --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 14.81 on 8 degrees of freedom Multiple R-squared: 0.461, Adjusted R-squared: 0.3936 F-statistic: 6.841 on 1 and 8 DF, p-value: 0.03086

Note that after the residuals and before the coefficients, there is the message:

Coefficients: (1 not defined because of singularities)

The error occurred because the two predictor variables `height_m`

and `height_cm`

are perfectly correlated.

Perfectly correlated variables do not provide unique information in the regression model. It is not possible to vary the predictor variable `height_m`

to see the effect on the response variable `weight`

without the predictor variable `height_cm`

also moving.

Therefore, it is impossible to estimate values for every coefficient in the regression model, which we can see with the `NA`

values for the coefficient estimate of `height_cm`

.

The values for `height_cm`

are the values for `height_m`

multiplied by `100`

. A predictor variable that is a multiple of another is an example of perfect multicollinearity, which means there is an exact linear relationship between the two variables.

### Solution

The first step of solving the error involves calling the `cor()`

function to get a correlation matrix and examining which variables have a correlation of exactly 1 with each other.

cor(df)

weight height_m height_cm weight 1.0000000 0.6789428 0.6789428 height_m 0.6789428 1.0000000 1.0000000 height_cm 0.6789428 1.0000000 1.0000000

We can see that the variables `height_m`

and `height_cm`

are perfectly correlated.

Next, we can drop either of the two variables from the model. Let’s drop `height_cm`

and fit the linear regression model.

model <- lm(weight~height_m, data=df) summary(model)

Let’s print the summary of the model.

Call: lm(formula = weight ~ height_m, data = df) Residuals: Min 1Q Median 3Q Max -21.6525 -5.4040 -3.3657 0.4445 29.2511 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -29.10 40.15 -0.725 0.4893 height_m 61.90 23.67 2.616 0.0309 * --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 14.81 on 8 degrees of freedom Multiple R-squared: 0.461, Adjusted R-squared: 0.3936 F-statistic: 6.841 on 1 and 8 DF, p-value: 0.03086

Note that the not “`defined because of singularities`

” error is gone, and we have a coefficient estimate for `height_m`

.

## Summary

Congratulations on reading to the end of this tutorial!

For further reading on R-related errors, go to the article:

- How to Count the Number of NA in R
- How to Solve R Error: non-conformable arguments
- How to Solve R Error in n(): Must be used inside dplyr verbs

Go to the online courses page on R to learn more about coding in R for data science and machine learning.

Have fun and happy researching!

Suf is a senior advisor in data science with deep expertise in Natural Language Processing, Complex Networks, and Anomaly Detection. Formerly a postdoctoral research fellow, he applied advanced physics techniques to tackle real-world, data-heavy industry challenges. Before that, he was a particle physicist at the ATLAS Experiment of the Large Hadron Collider. Now, he’s focused on bringing more fun and curiosity to the world of science and research online.