Select Page

# How to Solve R Error: contrasts can be applied only to factors with 2 or more levels

by | Programming, R, Tips

This error occurs when you try to fit a regression model and one or more of the predictor variables are either factor or character and have only one unique value.

You can solve this error by using the `lapply()` function to display each of the unique values for each variable and drop the variables with only one unique value. For example,

```df <-data.frame(var1=c(2, 4, 6, 8, 10),
var2=c(3, 10, 30, 90, 120),
var3=as.factor(17),
var4=c(22, 34, 11, 99, 4))

lapply(df[c('var1', 'var2', 'var3')], unique)
```

This tutorial will go through the error in detail and how to solve it with code examples.

## Example

Let’s go through an example of reproducing the error. First, we will define a data frame with four columns.

```df <-data.frame(i=c(2, 4, 6, 8, 10),
j=c(3, 10, 30, 90, 120),
k=as.factor(17),
l=c(22, 34, 11, 99, 4))

df```
```   i   j  k  l
1  2   3 17 22
2  4  10 17 34
3  6  30 17 11
4  8  90 17 99
5 10 120 17  4```

Next, we will attempt to fit a multiple linear regression module with `i`, `j`, and `k` as predictor variables and l as the response variable.

`model <- lm(l ~ i + j + k, data=df)`

Let’s run the code to see what happens:

```Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) :
contrasts can be applied only to factors with 2 or more levels```

The error occurs because `k` is a factor and only has one unique variable. As there is no variation in the `k` variable, R cannot fit the regression model.

### Solution

Wee can solve the error by identifying the variable that is a factor and only has one unique value and then dropping it. We can use `sapply()` with `lapply()` to count the unique values for each variable:

```sapply(lapply(df, unique), length)
```
```i j k l
5 5 1 5 ```

We can use the `lapply()` function to display each of the unique values for each of the predictor variables in the data frame.

`lapply(df[c('i', 'j', 'k')], unique)`
```\$i
[1]  2  4  6  8 10

\$j
[1]   3  10  30  90 120

\$k
[1] 17```

We can use the `which()` function to determine which variables have less than 2 unique values. For example:

` which(sapply(df, function(x) length(unique(x))<2))`
```k
3```

We have identified `k` as the only variable with one unique value. Therefore, we can drop this variable when fitting the regression model. Let’s look at the revised code:

```model <- lm(l ~ i + j, data=df)
summary(model)```

Let’s run the code to get the model summary:

```Call:
lm(formula = l ~ i + j, data = df)

Residuals:
1       2       3       4       5
-17.200   9.558  -8.092  56.310 -40.576

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)  56.8529    93.8517   0.606    0.606
i            -9.9119    28.8471  -0.344    0.764
j             0.7237     1.7632   0.410    0.721

Residual standard error: 51.33 on 2 degrees of freedom
Multiple R-squared:  0.09107,	Adjusted R-squared:  -0.8179
F-statistic: 0.1002 on 2 and 2 DF,  p-value: 0.9089```

We no longer raise the error with the dropping of the `k` variable.

## Summary

Congratulations on reading to the end of this tutorial!

For further reading on R-related errors, go to the articles:

Go to the online courses page on R to learn more about coding in R for data science and machine learning.

Have fun and happy researching!