If you try to aggregate a data frame and do not explicitly define the column to aggregate by, you will raise the error: aggregate.data.frame(as.data.frame(x), …): arguments must have same length. This error typically occurs if you use quotation marks to specify the column names in the by
argument of the aggregate()
function.
You can solve this error by either removing quotation marks or by using the tilde symbol followed by the column you want to group by in the first argument of aggregate()
.
This tutorial will go through the error in detail and how to solve it with code examples.
Table of contents
Example: Aggregate mtcars Dataset
The aggregate function splits the data into subsets, computes the summary statistics for each, and returns the result in a tabular form.
Let’s consider an example where we group the columns in the mtcars dataset by the number of cylinders (cyl) and apply the mean()
function. First, let’s look at the head of the mtcars dataset:
head(mtcars)
mpg cyl disp hp drat wt qsec vs am gear carb Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
Next, we will try to call the aggregate()
function and pass the cyl
column as the by argument.
aggdata <- aggregate(mtcars, by=list("cyl"), FUN=mean)
Let’s run the code to see what happens:
Error in aggregate.data.frame(mtcars, by = list("cyl"), FUN = mean) : arguments must have same length
The error occurs because we used quotation marks for the cyl
variable.
Solution #1
The first way we can solve this error is by removing the quotation marks around cyl
. Let’s look at the revised code:
aggdata <- aggregate(mtcars, by=list(cyl), FUN=mean)
Let’s run the code to get the result of the aggregate()
function call:
Group.1 mpg cyl disp hp drat wt qsec vs 1 4 26.66364 4 105.1364 82.63636 4.070909 2.285727 19.13727 0.9090909 2 6 19.74286 6 183.3143 122.28571 3.585714 3.117143 17.97714 0.5714286 3 8 15.10000 8 353.1000 209.21429 3.229286 3.999214 16.77214 0.0000000 am gear carb 1 0.7272727 4.090909 1.545455 2 0.4285714 3.857143 3.428571 3 0.1428571 3.285714 3.500000
We successfully grouped the columns in the data frame by the cyl
column and applied the mean()
function.
Solution #2
The second way we can solve this error is by using the tilde symbol, which stands for as a function of. Therefore . ~ cyl
as the first argument of the aggregate function means to group all columns in the mtcars dataset as a function of the number of cylinders. Let’s look at the revised code:
aggdata <- aggregate(. ~ cyl, mean, data=mtcars)
Note that we pass the mean()
function and the dataset as the second and third arguments. Let’s run the code to get the result:
cyl mpg disp hp drat wt qsec vs am 1 4 26.66364 105.1364 82.63636 4.070909 2.285727 19.13727 0.9090909 0.7272727 2 6 19.74286 183.3143 122.28571 3.585714 3.117143 17.97714 0.5714286 0.4285714 3 8 15.10000 353.1000 209.21429 3.229286 3.999214 16.77214 0.0000000 0.1428571 gear carb 1 4.090909 1.545455 2 3.857143 3.428571 3 3.285714 3.500000
We successfully grouped the columns in the data frame by the cyl
column and applied the mean()
function.
Summary
Congratulations on reading to the end of this tutorial!
For further reading on R related errors, go to the articles:
- How to Solve R Error: plot.window(…): need finite ‘ylim’ values
- How to Solve R Error in file(file, “rt”) cannot open the connection
- How to Solve R Error: aesthetics must be either length 1 or the same as the data
- How to Solve R Error: continuous value supplied to discrete scale
Go to the online courses page on R to learn more about coding in R for data science and machine learning.
Have fun and happy researching!
Suf is a senior advisor in data science with deep expertise in Natural Language Processing, Complex Networks, and Anomaly Detection. Formerly a postdoctoral research fellow, he applied advanced physics techniques to tackle real-world, data-heavy industry challenges. Before that, he was a particle physicist at the ATLAS Experiment of the Large Hadron Collider. Now, he’s focused on bringing more fun and curiosity to the world of science and research online.