How to Solve R Error: contrasts can be applied only to factors with 2 or more levels

by | Programming, R, Tips

This error occurs when you try to fit a regression model and one or more of the predictor variables are either factor or character and have only one unique value.

You can solve this error by using the lapply() function to display each of the unique values for each variable and drop the variables with only one unique value. For example,

df <-data.frame(var1=c(2, 4, 6, 8, 10), 
         var2=c(3, 10, 30, 90, 120),
         var3=as.factor(17),
         var4=c(22, 34, 11, 99, 4))

lapply(df[c('var1', 'var2', 'var3')], unique)

This tutorial will go through the error in detail and how to solve it with code examples.


Table of contents

Example

Let’s go through an example of reproducing the error. First, we will define a data frame with four columns.

df <-data.frame(i=c(2, 4, 6, 8, 10), 
         j=c(3, 10, 30, 90, 120),
         k=as.factor(17),
         l=c(22, 34, 11, 99, 4))

df
   i   j  k  l
1  2   3 17 22
2  4  10 17 34
3  6  30 17 11
4  8  90 17 99
5 10 120 17  4

Next, we will attempt to fit a multiple linear regression module with i, j, and k as predictor variables and l as the response variable.

model <- lm(l ~ i + j + k, data=df)

Let’s run the code to see what happens:

Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) : 
  contrasts can be applied only to factors with 2 or more levels

The error occurs because k is a factor and only has one unique variable. As there is no variation in the k variable, R cannot fit the regression model.

Solution

Wee can solve the error by identifying the variable that is a factor and only has one unique value and then dropping it. We can use sapply() with lapply() to count the unique values for each variable:

sapply(lapply(df, unique), length)
i j k l 
5 5 1 5 

We can use the lapply() function to display each of the unique values for each of the predictor variables in the data frame.

lapply(df[c('i', 'j', 'k')], unique)
$i
[1]  2  4  6  8 10

$j
[1]   3  10  30  90 120

$k
[1] 17

We can use the which() function to determine which variables have less than 2 unique values. For example:

 which(sapply(df, function(x) length(unique(x))<2))
k 
3

We have identified k as the only variable with one unique value. Therefore, we can drop this variable when fitting the regression model. Let’s look at the revised code:

model <- lm(l ~ i + j, data=df)
summary(model)

Let’s run the code to get the model summary:

Call:
lm(formula = l ~ i + j, data = df)

Residuals:
      1       2       3       4       5 
-17.200   9.558  -8.092  56.310 -40.576 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)  56.8529    93.8517   0.606    0.606
i            -9.9119    28.8471  -0.344    0.764
j             0.7237     1.7632   0.410    0.721

Residual standard error: 51.33 on 2 degrees of freedom
Multiple R-squared:  0.09107,	Adjusted R-squared:  -0.8179 
F-statistic: 0.1002 on 2 and 2 DF,  p-value: 0.9089

We no longer raise the error with the dropping of the k variable.

Summary

Congratulations on reading to the end of this tutorial!

For further reading on R-related errors, go to the articles: 

Go to the online courses page on R to learn more about coding in R for data science and machine learning.

Have fun and happy researching!

Profile Picture
Senior Advisor, Data Science | [email protected] | + posts

Suf is a senior advisor in data science with deep expertise in Natural Language Processing, Complex Networks, and Anomaly Detection. Formerly a postdoctoral research fellow, he applied advanced physics techniques to tackle real-world, data-heavy industry challenges. Before that, he was a particle physicist at the ATLAS Experiment of the Large Hadron Collider. Now, he’s focused on bringing more fun and curiosity to the world of science and research online.

Buy Me a Coffee ✨