*This error occurs when you try to fit a regression model and one or more of the predictor variables are either factor or character and have only one unique value. *

*You can solve this error by using the lapply() function to display each of the unique values for each variable and drop the variables with only one unique value.* For example,

df <-data.frame(var1=c(2, 4, 6, 8, 10), var2=c(3, 10, 30, 90, 120), var3=as.factor(17), var4=c(22, 34, 11, 99, 4)) lapply(df[c('var1', 'var2', 'var3')], unique)

*This tutorial will go through the error in detail and how to solve it with code examples.*

## Example

Let’s go through an example of reproducing the error. First, we will define a data frame with four columns.

df <-data.frame(i=c(2, 4, 6, 8, 10), j=c(3, 10, 30, 90, 120), k=as.factor(17), l=c(22, 34, 11, 99, 4)) df

i j k l 1 2 3 17 22 2 4 10 17 34 3 6 30 17 11 4 8 90 17 99 5 10 120 17 4

Next, we will attempt to fit a multiple linear regression module with `i`

, `j`

, and `k`

as predictor variables and l as the response variable.

model <- lm(l ~ i + j + k, data=df)

Let’s run the code to see what happens:

Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) : contrasts can be applied only to factors with 2 or more levels

The error occurs because `k`

is a factor and only has one unique variable. As there is no variation in the `k`

variable, R cannot fit the regression model.

### Solution

Wee can solve the error by identifying the variable that is a factor and only has one unique value and then dropping it. We can use `sapply()`

with `lapply()`

to count the unique values for each variable:

sapply(lapply(df, unique), length)

i j k l 5 5 1 5

We can use the `lapply()`

function to display each of the unique values for each of the predictor variables in the data frame.

lapply(df[c('i', 'j', 'k')], unique)

$i [1] 2 4 6 8 10 $j [1] 3 10 30 90 120 $k [1] 17

We can use the `which()`

function to determine which variables have less than 2 unique values. For example:

which(sapply(df, function(x) length(unique(x))<2))

k 3

We have identified `k`

as the only variable with one unique value. Therefore, we can drop this variable when fitting the regression model. Let’s look at the revised code:

model <- lm(l ~ i + j, data=df) summary(model)

Let’s run the code to get the model summary:

Call: lm(formula = l ~ i + j, data = df) Residuals: 1 2 3 4 5 -17.200 9.558 -8.092 56.310 -40.576 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 56.8529 93.8517 0.606 0.606 i -9.9119 28.8471 -0.344 0.764 j 0.7237 1.7632 0.410 0.721 Residual standard error: 51.33 on 2 degrees of freedom Multiple R-squared: 0.09107, Adjusted R-squared: -0.8179 F-statistic: 0.1002 on 2 and 2 DF, p-value: 0.9089

We no longer raise the error with the dropping of the `k`

variable.

## Summary

Congratulations on reading to the end of this tutorial!

For further reading on R-related errors, go to the articles:

- How to Solve R Error in sort.int(x, na.last = na.last, decreasing = decreasing, …) : ‘x’ must be atomic
- How to Solve R Error: Arguments imply differing number of rows
- How to Solve R Error in FUN: invalid ‘type’ (character) of argument

Go to the online courses page on R to learn more about coding in R for data science and machine learning.

Have fun and happy researching!

Suf is a senior advisor in data science with deep expertise in Natural Language Processing, Complex Networks, and Anomaly Detection. Formerly a postdoctoral research fellow, he applied advanced physics techniques to tackle real-world, data-heavy industry challenges. Before that, he was a particle physicist at the ATLAS Experiment of the Large Hadron Collider. Now, he’s focused on bringing more fun and curiosity to the world of science and research online.