This error occurs when you try to fit a regression model and one or more of the predictor variables are either factor or character and have only one unique value.
You can solve this error by using the
lapply() function to display each of the unique values for each variable and drop the variables with only one unique value. For example,
df <-data.frame(var1=c(2, 4, 6, 8, 10), var2=c(3, 10, 30, 90, 120), var3=as.factor(17), var4=c(22, 34, 11, 99, 4)) lapply(df[c('var1', 'var2', 'var3')], unique)
This tutorial will go through the error in detail and how to solve it with code examples.
Let’s go through an example of reproducing the error. First, we will define a data frame with four columns.
df <-data.frame(i=c(2, 4, 6, 8, 10), j=c(3, 10, 30, 90, 120), k=as.factor(17), l=c(22, 34, 11, 99, 4)) df
i j k l 1 2 3 17 22 2 4 10 17 34 3 6 30 17 11 4 8 90 17 99 5 10 120 17 4
Next, we will attempt to fit a multiple linear regression module with
k as predictor variables and l as the response variable.
model <- lm(l ~ i + j + k, data=df)
Let’s run the code to see what happens:
Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) : contrasts can be applied only to factors with 2 or more levels
The error occurs because
k is a factor and only has one unique variable. As there is no variation in the
k variable, R cannot fit the regression model.
Wee can solve the error by identifying the variable that is a factor and only has one unique value and then dropping it. We can use
lapply() to count the unique values for each variable:
sapply(lapply(df, unique), length)
i j k l 5 5 1 5
We can use the
lapply() function to display each of the unique values for each of the predictor variables in the data frame.
lapply(df[c('i', 'j', 'k')], unique)
$i  2 4 6 8 10 $j  3 10 30 90 120 $k  17
We can use the
which() function to determine which variables have less than 2 unique values. For example:
which(sapply(df, function(x) length(unique(x))<2))
We have identified
k as the only variable with one unique value. Therefore, we can drop this variable when fitting the regression model. Let’s look at the revised code:
model <- lm(l ~ i + j, data=df) summary(model)
Let’s run the code to get the model summary:
Call: lm(formula = l ~ i + j, data = df) Residuals: 1 2 3 4 5 -17.200 9.558 -8.092 56.310 -40.576 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 56.8529 93.8517 0.606 0.606 i -9.9119 28.8471 -0.344 0.764 j 0.7237 1.7632 0.410 0.721 Residual standard error: 51.33 on 2 degrees of freedom Multiple R-squared: 0.09107, Adjusted R-squared: -0.8179 F-statistic: 0.1002 on 2 and 2 DF, p-value: 0.9089
We no longer raise the error with the dropping of the
Congratulations on reading to the end of this tutorial!
For further reading on R-related errors, go to the articles:
- How to Solve R Error in sort.int(x, na.last = na.last, decreasing = decreasing, …) : ‘x’ must be atomic
- How to Solve R Error: Arguments imply differing number of rows
- How to Solve R Error in FUN: invalid ‘type’ (character) of argument
Go to the online courses page on R to learn more about coding in R for data science and machine learning.
Have fun and happy researching!
Suf is a research scientist at Moogsoft, specializing in Natural Language Processing and Complex Networks. Previously he was a Postdoctoral Research Fellow in Data Science working on adaptations of cutting-edge physics analysis techniques to data-intensive problems in industry. In another life, he was an experimental particle physicist working on the ATLAS Experiment of the Large Hadron Collider. His passion is to share his experience as an academic moving into industry while continuing to pursue research. Find out more about the creator of the Research Scientist Pod here and sign up to the mailing list here!