How to Solve R Error: contrasts can be applied only to factors with 2 or more levels

by | Programming, R, Tips

This error occurs when you try to fit a regression model and one or more of the predictor variables are either factor or character and have only one unique value.

You can solve this error by using the lapply() function to display each of the unique values for each variable and drop the variables with only one unique value. For example,

df <-data.frame(var1=c(2, 4, 6, 8, 10), 
         var2=c(3, 10, 30, 90, 120),
         var3=as.factor(17),
         var4=c(22, 34, 11, 99, 4))

lapply(df[c('var1', 'var2', 'var3')], unique)

This tutorial will go through the error in detail and how to solve it with code examples.


Table of contents

Example

Let’s go through an example of reproducing the error. First, we will define a data frame with four columns.

df <-data.frame(i=c(2, 4, 6, 8, 10), 
         j=c(3, 10, 30, 90, 120),
         k=as.factor(17),
         l=c(22, 34, 11, 99, 4))

df
   i   j  k  l
1  2   3 17 22
2  4  10 17 34
3  6  30 17 11
4  8  90 17 99
5 10 120 17  4

Next, we will attempt to fit a multiple linear regression module with i, j, and k as predictor variables and l as the response variable.

model <- lm(l ~ i + j + k, data=df)

Let’s run the code to see what happens:

Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) : 
  contrasts can be applied only to factors with 2 or more levels

The error occurs because k is a factor and only has one unique variable. As there is no variation in the k variable, R cannot fit the regression model.

Solution

Wee can solve the error by identifying the variable that is a factor and only has one unique value and then dropping it. We can use sapply() with lapply() to count the unique values for each variable:

sapply(lapply(df, unique), length)
i j k l 
5 5 1 5 

We can use the lapply() function to display each of the unique values for each of the predictor variables in the data frame.

lapply(df[c('i', 'j', 'k')], unique)
$i
[1]  2  4  6  8 10

$j
[1]   3  10  30  90 120

$k
[1] 17

We can use the which() function to determine which variables have less than 2 unique values. For example:

 which(sapply(df, function(x) length(unique(x))<2))
k 
3

We have identified k as the only variable with one unique value. Therefore, we can drop this variable when fitting the regression model. Let’s look at the revised code:

model <- lm(l ~ i + j, data=df)
summary(model)

Let’s run the code to get the model summary:

Call:
lm(formula = l ~ i + j, data = df)

Residuals:
      1       2       3       4       5 
-17.200   9.558  -8.092  56.310 -40.576 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)  56.8529    93.8517   0.606    0.606
i            -9.9119    28.8471  -0.344    0.764
j             0.7237     1.7632   0.410    0.721

Residual standard error: 51.33 on 2 degrees of freedom
Multiple R-squared:  0.09107,	Adjusted R-squared:  -0.8179 
F-statistic: 0.1002 on 2 and 2 DF,  p-value: 0.9089

We no longer raise the error with the dropping of the k variable.

Summary

Congratulations on reading to the end of this tutorial!

For further reading on R-related errors, go to the articles: 

Go to the online courses page on R to learn more about coding in R for data science and machine learning.

Have fun and happy researching!

Research Scientist at Moogsoft | + posts

Suf is a research scientist at Moogsoft, specializing in Natural Language Processing and Complex Networks. Previously he was a Postdoctoral Research Fellow in Data Science working on adaptations of cutting-edge physics analysis techniques to data-intensive problems in industry. In another life, he was an experimental particle physicist working on the ATLAS Experiment of the Large Hadron Collider. His passion is to share his experience as an academic moving into industry while continuing to pursue research. Find out more about the creator of the Research Scientist Pod here and sign up to the mailing list here!