In R, if you incorrectly pass data to the predict() function when evaluating a model, you will raise the error model.frame.default: ‘data’ must be a data.frame, environment, or list. The predict() function expects a data frame. You can solve this error by ensuring the data your pass to the predict() function is a data frame.

This tutorial will go through the error in detail and how to solve it with code examples.

Table of contents


Consider the following random data:

x <- rnorm(1000)
y <- 3.1 * x + runif(1000)
df <- data.frame(x, y)

In the above code, we created a data frame with two columns. The column x contains the values for the predictor variable and the column y contains the values for the response variable.

           x          y
1  1.2629543  4.3022713
2 -0.3262334 -0.1395184
3  1.3297993  5.0895748
4  1.2724293  4.8114472
5  0.4146414  1.7231037
6 -1.5399500 -4.5819073

Next, we will split the data frame in half to create two data frames, one for fitting a linear regression model and the other for prediction.

data_1 <- data.frame(x, y)[1:500,]
data_2 <- data.frame(x, y)[500:1000,]

Now that we have our two data frames, each consisting of 500 rows, we can fit a linear model using lm() and obtain the summary statistics using summary().

model <- lm(y~x, data_1)
lm(formula = y ~ x, data = data_1)

     Min       1Q   Median       3Q      Max 
-0.49983 -0.25289 -0.01052  0.26499  0.53284 

            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  0.48791    0.01311   37.22   <2e-16 ***
x            3.11686    0.01326  235.09   <2e-16 ***
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.2931 on 498 degrees of freedom
Multiple R-squared:  0.9911,	Adjusted R-squared:  0.9911 
F-statistic: 5.527e+04 on 1 and 498 DF,  p-value: < 2.2e-16

Let’s try to get predictions using the x column from the test data frame:

predict(model, newdata=data_2$x)
Error in model.frame.default(Terms, newdata, na.action = na.action, xlev = object$xlevels) : 
  'data' must be a data.frame, environment, or list

The R interpreter raises the error because the data argument provided is incorrect. The predict() function expects a data frame not a column vector, which we gave using data_2$x.


We can solve this error by passing the entire data frame as the data argument. The predict functions will return the predicted values for the test data frame.

predict(model, newdata=data_2)

Let’s run the code and show the first five predicated values.

         500          501          502          503          504          505 
-1.308982962 -1.464393709  1.988164730  5.771834227 -5.001588999  1.105088360 


Congratulations on reading to the end of this tutorial!

For further reading on R related errors, go to the articles: 

Go to the online courses page on R to learn more about coding in R for data science and machine learning.

Have fun and happy researching!