Select Page

How to Solve R Error model.frame.default: ‘data’ must be a data.frame, environment, or list

by | Programming, R, Tips

In R, if you incorrectly pass data to the predict() function when evaluating a model, you will raise the error model.frame.default: ‘data’ must be a data.frame, environment, or list. The predict() function expects a data frame. You can solve this error by ensuring the data your pass to the predict() function is a data frame.

This tutorial will go through the error in detail and how to solve it with code examples.


Table of contents

Example:

Consider the following random data:

set.seed(0)
x <- rnorm(1000)
y <- 3.1 * x + runif(1000)
df <- data.frame(x, y)
head(df)

In the above code, we created a data frame with two columns. The column x contains the values for the predictor variable and the column y contains the values for the response variable.

           x          y
1  1.2629543  4.3022713
2 -0.3262334 -0.1395184
3  1.3297993  5.0895748
4  1.2724293  4.8114472
5  0.4146414  1.7231037
6 -1.5399500 -4.5819073

Next, we will split the data frame in half to create two data frames, one for fitting a linear regression model and the other for prediction.

data_1 <- data.frame(x, y)[1:500,]
data_2 <- data.frame(x, y)[500:1000,]

Now that we have our two data frames, each consisting of 500 rows, we can fit a linear model using lm() and obtain the summary statistics using summary().

model <- lm(y~x, data_1)
summary(model)
Call:
lm(formula = y ~ x, data = data_1)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.49983 -0.25289 -0.01052  0.26499  0.53284 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  0.48791    0.01311   37.22   <2e-16 ***
x            3.11686    0.01326  235.09   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.2931 on 498 degrees of freedom
Multiple R-squared:  0.9911,	Adjusted R-squared:  0.9911 
F-statistic: 5.527e+04 on 1 and 498 DF,  p-value: < 2.2e-16

Let’s try to get predictions using the x column from the test data frame:

predict(model, newdata=data_2$x)
Error in model.frame.default(Terms, newdata, na.action = na.action, xlev = object$xlevels) : 
  'data' must be a data.frame, environment, or list

The R interpreter raises the error because the data argument provided is incorrect. The predict() function expects a data frame not a column vector, which we gave using data_2$x.

Solution

We can solve this error by passing the entire data frame as the data argument. The predict functions will return the predicted values for the test data frame.

predict(model, newdata=data_2)

Let’s run the code and show the first five predicated values.

         500          501          502          503          504          505 
-1.308982962 -1.464393709  1.988164730  5.771834227 -5.001588999  1.105088360 

Summary

Congratulations on reading to the end of this tutorial!

For further reading on R related errors, go to the articles: 

Go to the online courses page on R to learn more about coding in R for data science and machine learning.

Have fun and happy researching!

Research Scientist at Moogsoft | + posts

Suf is a research scientist at Moogsoft, specializing in Natural Language Processing and Complex Networks. Previously he was a Postdoctoral Research Fellow in Data Science working on adaptations of cutting-edge physics analysis techniques to data-intensive problems in industry. In another life, he was an experimental particle physicist working on the ATLAS Experiment of the Large Hadron Collider. His passion is to share his experience as an academic moving into industry while continuing to pursue research. Find out more about the creator of the Research Scientist Pod here and sign up to the mailing list here!