How to Solve R error in aggregate.data.frame(as.data.frame(x), …) : arguments must have same length

by | Programming, R, Tips

If you try to aggregate a data frame and do not explicitly define the column to aggregate by, you will raise the error: aggregate.data.frame(as.data.frame(x), …): arguments must have same length. This error typically occurs if you use quotation marks to specify the column names in the by argument of the aggregate() function.

You can solve this error by either removing quotation marks or by using the tilde symbol followed by the column you want to group by in the first argument of aggregate().

This tutorial will go through the error in detail and how to solve it with code examples.


Example: Aggregate mtcars Dataset

The aggregate function splits the data into subsets, computes the summary statistics for each, and returns the result in a tabular form.

Let’s consider an example where we group the columns in the mtcars dataset by the number of cylinders (cyl) and apply the mean() function. First, let’s look at the head of the mtcars dataset:

head(mtcars)
                 mpg cyl disp  hp drat    wt  qsec vs am gear carb
Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

Next, we will try to call the aggregate() function and pass the cyl column as the by argument.

aggdata <- aggregate(mtcars, by=list("cyl"), FUN=mean)

Let’s run the code to see what happens:

Error in aggregate.data.frame(mtcars, by = list("cyl"), FUN = mean) : 
  arguments must have same length

The error occurs because we used quotation marks for the cyl variable.

Solution #1

The first way we can solve this error is by removing the quotation marks around cyl. Let’s look at the revised code:

aggdata <- aggregate(mtcars, by=list(cyl), FUN=mean)

Let’s run the code to get the result of the aggregate() function call:

Group.1      mpg cyl     disp        hp     drat       wt     qsec        vs
1       4 26.66364   4 105.1364  82.63636 4.070909 2.285727 19.13727 0.9090909
2       6 19.74286   6 183.3143 122.28571 3.585714 3.117143 17.97714 0.5714286
3       8 15.10000   8 353.1000 209.21429 3.229286 3.999214 16.77214 0.0000000
         am     gear     carb
1 0.7272727 4.090909 1.545455
2 0.4285714 3.857143 3.428571
3 0.1428571 3.285714 3.500000

We successfully grouped the columns in the data frame by the cyl column and applied the mean() function.

Solution #2

The second way we can solve this error is by using the tilde symbol, which stands for as a function of. Therefore . ~ cyl as the first argument of the aggregate function means to group all columns in the mtcars dataset as a function of the number of cylinders. Let’s look at the revised code:

aggdata <- aggregate(. ~ cyl, mean, data=mtcars)

Note that we pass the mean() function and the dataset as the second and third arguments. Let’s run the code to get the result:

  cyl      mpg     disp        hp     drat       wt     qsec        vs        am
1   4 26.66364 105.1364  82.63636 4.070909 2.285727 19.13727 0.9090909 0.7272727
2   6 19.74286 183.3143 122.28571 3.585714 3.117143 17.97714 0.5714286 0.4285714
3   8 15.10000 353.1000 209.21429 3.229286 3.999214 16.77214 0.0000000 0.1428571
      gear     carb
1 4.090909 1.545455
2 3.857143 3.428571
3 3.285714 3.500000

We successfully grouped the columns in the data frame by the cyl column and applied the mean() function.

Summary

Congratulations on reading to the end of this tutorial!

For further reading on R related errors, go to the articles: 

Go to the online courses page on R to learn more about coding in R for data science and machine learning.

Have fun and happy researching!

Profile Picture
Senior Advisor, Data Science | [email protected] |  + posts

Suf is a senior advisor in data science with deep expertise in Natural Language Processing, Complex Networks, and Anomaly Detection. Formerly a postdoctoral research fellow, he applied advanced physics techniques to tackle real-world, data-heavy industry challenges. Before that, he was a particle physicist at the ATLAS Experiment of the Large Hadron Collider. Now, he’s focused on bringing more fun and curiosity to the world of science and research online.

Buy Me a Coffee ✨