# How to Solve R Error: not defined because of singularities

by | Programming, R, Tips

This error occurs when you attempt to fit a model and two or more predictor variables are perfectly correlated.

You can solve this error by using the `cor()` function to identify the variables with a perfect correlation and drop one of the variables from the model.

This tutorial will go through the error in detail and how to solve it with code examples

## Example

Let’s look at an example of fitting a linear regression model using a data frame. First, we will define a data frame containing the weight in kilograms and height in metres and centimetres of 10 subjects.

```df <- data.frame(weight=c(74, 58, 96, 75, 102, 86, 47, 93, 69, 52),
height_m =c(1.7, 1.5, 2.0, 1.75, 1.84, 1.9, 1.3, 1.5, 1.7, 1.66),
height_cm=c(170, 150, 200, 175, 184, 190, 130, 150, 170, 166))
summary(df)```

Let’s look at an example of fitting a linear regression model using a data frame. First, we will define the data frame.

```    weight          height_m       height_cm
Min.   : 47.00   Min.   :1.300   Min.   :130.0
1st Qu.: 60.75   1st Qu.:1.540   1st Qu.:154.0
Median : 74.50   Median :1.700   Median :170.0
Mean   : 75.20   Mean   :1.685   Mean   :168.5
3rd Qu.: 91.25   3rd Qu.:1.817   3rd Qu.:181.8
Max.   :102.00   Max.   :2.000   Max.   :200.0  ```

Next, we will fit a linear regression model on the data and print the model summary to the console:

```model <- lm(weight~height_m+height_cm, data=df)
summary(model)```
```Call:
lm(formula = weight ~ height_m + height_cm, data = df)

Residuals:
Min       1Q   Median       3Q      Max
-21.6525  -5.4040  -3.3657   0.4445  29.2511

Coefficients: (1 not defined because of singularities)
Estimate Std. Error t value Pr(>|t|)
(Intercept)   -29.10      40.15  -0.725   0.4893
height_m       61.90      23.67   2.616   0.0309 *
height_cm         NA         NA      NA       NA
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 14.81 on 8 degrees of freedom
Multiple R-squared:  0.461,	Adjusted R-squared:  0.3936
F-statistic: 6.841 on 1 and 8 DF,  p-value: 0.03086```

Note that after the residuals and before the coefficients, there is the message:

`Coefficients: (1 not defined because of singularities)`

The error occurred because the two predictor variables `height_m` and `height_cm` are perfectly correlated.

Perfectly correlated variables do not provide unique information in the regression model. It is not possible to vary the predictor variable `height_m` to see the effect on the response variable `weight` without the predictor variable `height_cm` also moving.

Therefore, it is impossible to estimate values for every coefficient in the regression model, which we can see with the `NA` values for the coefficient estimate of `height_cm`.

The values for `height_cm` are the values for `height_m` multiplied by `100`. A predictor variable that is a multiple of another is an example of perfect multicollinearity, which means there is an exact linear relationship between the two variables.

### Solution

The first step of solving the error involves calling the `cor()` function to get a correlation matrix and examining which variables have a correlation of exactly 1 with each other.

`cor(df)`
```            weight  height_m height_cm
weight    1.0000000 0.6789428 0.6789428
height_m  0.6789428 1.0000000 1.0000000
height_cm 0.6789428 1.0000000 1.0000000```

We can see that the variables `height_m` and `height_cm` are perfectly correlated.

Next, we can drop either of the two variables from the model. Let’s drop `height_cm` and fit the linear regression model.

```model <- lm(weight~height_m, data=df)
summary(model)```

Let’s print the summary of the model.

```Call:
lm(formula = weight ~ height_m, data = df)

Residuals:
Min       1Q   Median       3Q      Max
-21.6525  -5.4040  -3.3657   0.4445  29.2511

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)   -29.10      40.15  -0.725   0.4893
height_m       61.90      23.67   2.616   0.0309 *
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 14.81 on 8 degrees of freedom
Multiple R-squared:  0.461,	Adjusted R-squared:  0.3936
F-statistic: 6.841 on 1 and 8 DF,  p-value: 0.03086```

Note that the not “`defined because of singularities`” error is gone, and we have a coefficient estimate for `height_m`.

## Summary

Congratulations on reading to the end of this tutorial!

For further reading on R-related errors, go to the article:

Go to the online courses page on R to learn more about coding in R for data science and machine learning.

Have fun and happy researching!

##### Suf
Research Scientist at | + posts

Suf is a research scientist at Moogsoft, specializing in Natural Language Processing and Complex Networks. Previously he was a Postdoctoral Research Fellow in Data Science working on adaptations of cutting-edge physics analysis techniques to data-intensive problems in industry. In another life, he was an experimental particle physicist working on the ATLAS Experiment of the Large Hadron Collider. His passion is to share his experience as an academic moving into industry while continuing to pursue research. Find out more about the creator of the Research Scientist Pod here and sign up to the mailing list here!