Select Page

# How to Solve R Error: not defined because of singularities

by | Programming, R, Tips

This error occurs when you attempt to fit a model and two or more predictor variables are perfectly correlated.

You can solve this error by using the `cor()` function to identify the variables with a perfect correlation and drop one of the variables from the model.

This tutorial will go through the error in detail and how to solve it with code examples

## Example

Let’s look at an example of fitting a linear regression model using a data frame. First, we will define a data frame containing the weight in kilograms and height in metres and centimetres of 10 subjects.

```df <- data.frame(weight=c(74, 58, 96, 75, 102, 86, 47, 93, 69, 52),
height_m =c(1.7, 1.5, 2.0, 1.75, 1.84, 1.9, 1.3, 1.5, 1.7, 1.66),
height_cm=c(170, 150, 200, 175, 184, 190, 130, 150, 170, 166))
summary(df)```

Let’s look at an example of fitting a linear regression model using a data frame. First, we will define the data frame.

```    weight          height_m       height_cm
Min.   : 47.00   Min.   :1.300   Min.   :130.0
1st Qu.: 60.75   1st Qu.:1.540   1st Qu.:154.0
Median : 74.50   Median :1.700   Median :170.0
Mean   : 75.20   Mean   :1.685   Mean   :168.5
3rd Qu.: 91.25   3rd Qu.:1.817   3rd Qu.:181.8
Max.   :102.00   Max.   :2.000   Max.   :200.0  ```

Next, we will fit a linear regression model on the data and print the model summary to the console:

```model <- lm(weight~height_m+height_cm, data=df)
summary(model)```
```Call:
lm(formula = weight ~ height_m + height_cm, data = df)

Residuals:
Min       1Q   Median       3Q      Max
-21.6525  -5.4040  -3.3657   0.4445  29.2511

Coefficients: (1 not defined because of singularities)
Estimate Std. Error t value Pr(>|t|)
(Intercept)   -29.10      40.15  -0.725   0.4893
height_m       61.90      23.67   2.616   0.0309 *
height_cm         NA         NA      NA       NA
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 14.81 on 8 degrees of freedom
Multiple R-squared:  0.461,	Adjusted R-squared:  0.3936
F-statistic: 6.841 on 1 and 8 DF,  p-value: 0.03086```

Note that after the residuals and before the coefficients, there is the message:

`Coefficients: (1 not defined because of singularities)`

The error occurred because the two predictor variables `height_m` and `height_cm` are perfectly correlated.

Perfectly correlated variables do not provide unique information in the regression model. It is not possible to vary the predictor variable `height_m` to see the effect on the response variable `weight` without the predictor variable `height_cm` also moving.

Therefore, it is impossible to estimate values for every coefficient in the regression model, which we can see with the `NA` values for the coefficient estimate of `height_cm`.

The values for `height_cm` are the values for `height_m` multiplied by `100`. A predictor variable that is a multiple of another is an example of perfect multicollinearity, which means there is an exact linear relationship between the two variables.

### Solution

The first step of solving the error involves calling the `cor()` function to get a correlation matrix and examining which variables have a correlation of exactly 1 with each other.

`cor(df)`
```            weight  height_m height_cm
weight    1.0000000 0.6789428 0.6789428
height_m  0.6789428 1.0000000 1.0000000
height_cm 0.6789428 1.0000000 1.0000000```

We can see that the variables `height_m` and `height_cm` are perfectly correlated.

Next, we can drop either of the two variables from the model. Let’s drop `height_cm` and fit the linear regression model.

```model <- lm(weight~height_m, data=df)
summary(model)```

Let’s print the summary of the model.

```Call:
lm(formula = weight ~ height_m, data = df)

Residuals:
Min       1Q   Median       3Q      Max
-21.6525  -5.4040  -3.3657   0.4445  29.2511

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)   -29.10      40.15  -0.725   0.4893
height_m       61.90      23.67   2.616   0.0309 *
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 14.81 on 8 degrees of freedom
Multiple R-squared:  0.461,	Adjusted R-squared:  0.3936
F-statistic: 6.841 on 1 and 8 DF,  p-value: 0.03086```

Note that the not “`defined because of singularities`” error is gone, and we have a coefficient estimate for `height_m`.

## Summary

Congratulations on reading to the end of this tutorial!

For further reading on R-related errors, go to the article:

Go to the online courses page on R to learn more about coding in R for data science and machine learning.

Have fun and happy researching!