This error occurs when you try to fit a regression model and one or more of the predictor variables are either factor or character and have only one unique value.
You can solve this error by using the lapply()
function to display each of the unique values for each variable and drop the variables with only one unique value. For example,
df <-data.frame(var1=c(2, 4, 6, 8, 10), var2=c(3, 10, 30, 90, 120), var3=as.factor(17), var4=c(22, 34, 11, 99, 4)) lapply(df[c('var1', 'var2', 'var3')], unique)
This tutorial will go through the error in detail and how to solve it with code examples.
Example
Let’s go through an example of reproducing the error. First, we will define a data frame with four columns.
df <-data.frame(i=c(2, 4, 6, 8, 10), j=c(3, 10, 30, 90, 120), k=as.factor(17), l=c(22, 34, 11, 99, 4)) df
i j k l 1 2 3 17 22 2 4 10 17 34 3 6 30 17 11 4 8 90 17 99 5 10 120 17 4
Next, we will attempt to fit a multiple linear regression module with i
, j
, and k
as predictor variables and l as the response variable.
model <- lm(l ~ i + j + k, data=df)
Let’s run the code to see what happens:
Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) : contrasts can be applied only to factors with 2 or more levels
The error occurs because k
is a factor and only has one unique variable. As there is no variation in the k
variable, R cannot fit the regression model.
Solution
Wee can solve the error by identifying the variable that is a factor and only has one unique value and then dropping it. We can use sapply()
with lapply()
to count the unique values for each variable:
sapply(lapply(df, unique), length)
i j k l 5 5 1 5
We can use the lapply()
function to display each of the unique values for each of the predictor variables in the data frame.
lapply(df[c('i', 'j', 'k')], unique)
$i [1] 2 4 6 8 10 $j [1] 3 10 30 90 120 $k [1] 17
We can use the which()
function to determine which variables have less than 2 unique values. For example:
which(sapply(df, function(x) length(unique(x))<2))
k 3
We have identified k
as the only variable with one unique value. Therefore, we can drop this variable when fitting the regression model. Let’s look at the revised code:
model <- lm(l ~ i + j, data=df) summary(model)
Let’s run the code to get the model summary:
Call: lm(formula = l ~ i + j, data = df) Residuals: 1 2 3 4 5 -17.200 9.558 -8.092 56.310 -40.576 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 56.8529 93.8517 0.606 0.606 i -9.9119 28.8471 -0.344 0.764 j 0.7237 1.7632 0.410 0.721 Residual standard error: 51.33 on 2 degrees of freedom Multiple R-squared: 0.09107, Adjusted R-squared: -0.8179 F-statistic: 0.1002 on 2 and 2 DF, p-value: 0.9089
We no longer raise the error with the dropping of the k
variable.
Summary
Congratulations on reading to the end of this tutorial!
For further reading on R-related errors, go to the articles:
- How to Solve R Error in sort.int(x, na.last = na.last, decreasing = decreasing, …) : ‘x’ must be atomic
- How to Solve R Error: Arguments imply differing number of rows
- How to Solve R Error in FUN: invalid ‘type’ (character) of argument
Go to the online courses page on R to learn more about coding in R for data science and machine learning.
Have fun and happy researching!
Suf is a senior advisor in data science with deep expertise in Natural Language Processing, Complex Networks, and Anomaly Detection. Formerly a postdoctoral research fellow, he applied advanced physics techniques to tackle real-world, data-heavy industry challenges. Before that, he was a particle physicist at the ATLAS Experiment of the Large Hadron Collider. Now, he’s focused on bringing more fun and curiosity to the world of science and research online.