Select Page

# How to Solve R Error model.frame.default: ‘data’ must be a data.frame, environment, or list

by | Programming, R, Tips

In R, if you incorrectly pass data to the `predict()` function when evaluating a model, you will raise the error model.frame.default: ‘data’ must be a data.frame, environment, or list. The `predict()` function expects a data frame. You can solve this error by ensuring the data your pass to the `predict()` function is a data frame.

This tutorial will go through the error in detail and how to solve it with code examples.

## Example:

Consider the following random data:

```set.seed(0)
x <- rnorm(1000)
y <- 3.1 * x + runif(1000)
df <- data.frame(x, y)

In the above code, we created a data frame with two columns. The column `x` contains the values for the predictor variable and the column `y` contains the values for the response variable.

```           x          y
1  1.2629543  4.3022713
2 -0.3262334 -0.1395184
3  1.3297993  5.0895748
4  1.2724293  4.8114472
5  0.4146414  1.7231037
6 -1.5399500 -4.5819073```

Next, we will split the data frame in half to create two data frames, one for fitting a linear regression model and the other for prediction.

```data_1 <- data.frame(x, y)[1:500,]
data_2 <- data.frame(x, y)[500:1000,]```

Now that we have our two data frames, each consisting of 500 rows, we can fit a linear model using `lm()` and obtain the summary statistics using `summary()`.

```model <- lm(y~x, data_1)
summary(model)```
```Call:
lm(formula = y ~ x, data = data_1)

Residuals:
Min       1Q   Median       3Q      Max
-0.49983 -0.25289 -0.01052  0.26499  0.53284

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)  0.48791    0.01311   37.22   <2e-16 ***
x            3.11686    0.01326  235.09   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.2931 on 498 degrees of freedom
Multiple R-squared:  0.9911,	Adjusted R-squared:  0.9911
F-statistic: 5.527e+04 on 1 and 498 DF,  p-value: < 2.2e-16
```

Let’s try to get predictions using the `x` column from the test data frame:

`predict(model, newdata=data_2\$x)`
```Error in model.frame.default(Terms, newdata, na.action = na.action, xlev = object\$xlevels) :
'data' must be a data.frame, environment, or list```

The R interpreter raises the error because the data argument provided is incorrect. The `predict()` function expects a data frame not a column vector, which we gave using `data_2\$x`.

### Solution

We can solve this error by passing the entire data frame as the data argument. The predict functions will return the predicted values for the test data frame.

`predict(model, newdata=data_2)`

Let’s run the code and show the first five predicated values.

```         500          501          502          503          504          505
-1.308982962 -1.464393709  1.988164730  5.771834227 -5.001588999  1.105088360 ```

## Summary

Congratulations on reading to the end of this tutorial!

For further reading on R related errors, go to the articles:

Go to the online courses page on R to learn more about coding in R for data science and machine learning.

Have fun and happy researching!