How to Solve R Error model.frame.default: ‘data’ must be a data.frame, environment, or list

by | Programming, R, Tips

In R, if you incorrectly pass data to the predict() function when evaluating a model, you will raise the error model.frame.default: ‘data’ must be a data.frame, environment, or list. The predict() function expects a data frame. You can solve this error by ensuring the data your pass to the predict() function is a data frame.

This tutorial will go through the error in detail and how to solve it with code examples.


Table of contents

Example:

Consider the following random data:

set.seed(0)
x <- rnorm(1000)
y <- 3.1 * x + runif(1000)
df <- data.frame(x, y)
head(df)

In the above code, we created a data frame with two columns. The column x contains the values for the predictor variable and the column y contains the values for the response variable.

           x          y
1  1.2629543  4.3022713
2 -0.3262334 -0.1395184
3  1.3297993  5.0895748
4  1.2724293  4.8114472
5  0.4146414  1.7231037
6 -1.5399500 -4.5819073

Next, we will split the data frame in half to create two data frames, one for fitting a linear regression model and the other for prediction.

data_1 <- data.frame(x, y)[1:500,]
data_2 <- data.frame(x, y)[500:1000,]

Now that we have our two data frames, each consisting of 500 rows, we can fit a linear model using lm() and obtain the summary statistics using summary().

model <- lm(y~x, data_1)
summary(model)
Call:
lm(formula = y ~ x, data = data_1)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.49983 -0.25289 -0.01052  0.26499  0.53284 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  0.48791    0.01311   37.22   <2e-16 ***
x            3.11686    0.01326  235.09   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.2931 on 498 degrees of freedom
Multiple R-squared:  0.9911,	Adjusted R-squared:  0.9911 
F-statistic: 5.527e+04 on 1 and 498 DF,  p-value: < 2.2e-16

Let’s try to get predictions using the x column from the test data frame:

predict(model, newdata=data_2$x)
Error in model.frame.default(Terms, newdata, na.action = na.action, xlev = object$xlevels) : 
  'data' must be a data.frame, environment, or list

The R interpreter raises the error because the data argument provided is incorrect. The predict() function expects a data frame not a column vector, which we gave using data_2$x.

Solution

We can solve this error by passing the entire data frame as the data argument. The predict functions will return the predicted values for the test data frame.

predict(model, newdata=data_2)

Let’s run the code and show the first five predicated values.

         500          501          502          503          504          505 
-1.308982962 -1.464393709  1.988164730  5.771834227 -5.001588999  1.105088360 

Summary

Congratulations on reading to the end of this tutorial!

For further reading on R related errors, go to the articles: 

Go to the online courses page on R to learn more about coding in R for data science and machine learning.

Have fun and happy researching!

Profile Picture
Senior Advisor, Data Science | [email protected] |  + posts

Suf is a senior advisor in data science with deep expertise in Natural Language Processing, Complex Networks, and Anomaly Detection. Formerly a postdoctoral research fellow, he applied advanced physics techniques to tackle real-world, data-heavy industry challenges. Before that, he was a particle physicist at the ATLAS Experiment of the Large Hadron Collider. Now, he’s focused on bringing more fun and curiosity to the world of science and research online.

Buy Me a Coffee ✨