In R, if you incorrectly pass data to the predict()
function when evaluating a model, you will raise the error model.frame.default: ‘data’ must be a data.frame, environment, or list. The predict()
function expects a data frame. You can solve this error by ensuring the data your pass to the predict()
function is a data frame.
This tutorial will go through the error in detail and how to solve it with code examples.
Example:
Consider the following random data:
set.seed(0) x <- rnorm(1000) y <- 3.1 * x + runif(1000) df <- data.frame(x, y) head(df)
In the above code, we created a data frame with two columns. The column x
contains the values for the predictor variable and the column y
contains the values for the response variable.
x y 1 1.2629543 4.3022713 2 -0.3262334 -0.1395184 3 1.3297993 5.0895748 4 1.2724293 4.8114472 5 0.4146414 1.7231037 6 -1.5399500 -4.5819073
Next, we will split the data frame in half to create two data frames, one for fitting a linear regression model and the other for prediction.
data_1 <- data.frame(x, y)[1:500,] data_2 <- data.frame(x, y)[500:1000,]
Now that we have our two data frames, each consisting of 500 rows, we can fit a linear model using lm()
and obtain the summary statistics using summary()
.
model <- lm(y~x, data_1) summary(model)
Call: lm(formula = y ~ x, data = data_1) Residuals: Min 1Q Median 3Q Max -0.49983 -0.25289 -0.01052 0.26499 0.53284 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 0.48791 0.01311 37.22 <2e-16 *** x 3.11686 0.01326 235.09 <2e-16 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.2931 on 498 degrees of freedom Multiple R-squared: 0.9911, Adjusted R-squared: 0.9911 F-statistic: 5.527e+04 on 1 and 498 DF, p-value: < 2.2e-16
Let’s try to get predictions using the x
column from the test data frame:
predict(model, newdata=data_2$x)
Error in model.frame.default(Terms, newdata, na.action = na.action, xlev = object$xlevels) : 'data' must be a data.frame, environment, or list
The R interpreter raises the error because the data argument provided is incorrect. The predict()
function expects a data frame not a column vector, which we gave using data_2$x
.
Solution
We can solve this error by passing the entire data frame as the data argument. The predict functions will return the predicted values for the test data frame.
predict(model, newdata=data_2)
Let’s run the code and show the first five predicated values.
500 501 502 503 504 505 -1.308982962 -1.464393709 1.988164730 5.771834227 -5.001588999 1.105088360
Summary
Congratulations on reading to the end of this tutorial!
For further reading on R related errors, go to the articles:
- How to Solve R Error: plot.window(…): need finite ‘ylim’ values
- How to Solve R Error: non-numeric argument to binary operator
- How to Solve R Error: Subscript out of bounds
Go to the online courses page on R to learn more about coding in R for data science and machine learning.
Have fun and happy researching!
Suf is a senior advisor in data science with deep expertise in Natural Language Processing, Complex Networks, and Anomaly Detection. Formerly a postdoctoral research fellow, he applied advanced physics techniques to tackle real-world, data-heavy industry challenges. Before that, he was a particle physicist at the ATLAS Experiment of the Large Hadron Collider. Now, he’s focused on bringing more fun and curiosity to the world of science and research online.