This R warning occurs when you have more than one column in group_by
when using the dplyr::summarise().
The summarise function has a .groups argument with a default value of ‘drop_last’. If you set the .groups argument manually, the warning will not appear.
It is only a warning message and does not affect the final output. This tutorial will show you how to reproduce the warning and stop it from occurring.
Example
Let’s look at an example to reproduce the warning. We will load the dplyr package and then attempt to group the mtcars dataset by the number of cylinders (cyl
) and automatic/manual (am
)
library(dplyr) mtcars %>% group_by(cyl, am) %>% summarise(avg_mpg = mean(mpg))
`summarise()` has grouped output by 'cyl'. You can override using the `.groups` argument. # A tibble: 6 × 3 # Groups: cyl [3] cyl am avg_mpg <dbl> <dbl> <dbl> 1 4 0 22.9 2 4 1 28.1 3 6 0 19.1 4 6 1 20.6 5 8 0 15.0 6 8 1 15.4
The summarise()
function without the .groups
argument set will strip the last grouping key by default, returning a grouped data frame if multiple grouping columns exist.
The warning tells us that only the first of the original grouping keys were preserved using the default .groups = 'drop_last'
.
Although the am
grouping key was dropped, both cyl
and am
are defined.
Solution #1: Set .groups argument
If we check the documentation for the summarise function, we can see the .groups
argument and the possible settings for it.
?summarise
If we set the .groups
argument in summarise
, the warning message will not appear. For example:
library(dplyr) mtcars %>% group_by(cyl, am) %>% summarise(avg_mpg = mean(mpg), .groups='drop_last')
# A tibble: 6 × 3 # Groups: cyl [3] cyl am avg_mpg <dbl> <dbl> <dbl> 1 4 0 22.9 2 4 1 28.1 3 6 0 19.1 4 6 1 20.6 5 8 0 15.0 6 8 1 15.4
We can see the warning was not issued. This warning occurs because we may want to do a mutation on the data frame with the assumption that there is no grouping, which would produce unexpected output. This warning tells us how the grouping attribute is being handled by summarise
by default.
Let’s look at what happens when we use .groups='keep'
:
library(dplyr) mtcars %>% group_by(cyl, am) %>% summarise(avg_mpg = mean(mpg), .groups='keep')
# A tibble: 6 × 3 # Groups: cyl, am [6] cyl am avg_mpg <dbl> <dbl> <dbl> 1 4 0 22.9 2 4 1 28.1 3 6 0 19.1 4 6 1 20.6 5 8 0 15.0 6 8 1 15.4
The R interpreter tells us that both grouping attributes were kept.
Solution #2: use options(dplyr.summarise.inform=FALSE)
If you do not plan to change the grouping attributes to drop when using summarise from the default of ‘.drop_last
‘, you can turn off the warning using options(dplyr.summarise.inform=FALSE)
. Let’s look at the revised example:
library(dplyr) options(dplyr.summarise.inform=FALSE) mtcars %>% group_by(cyl, am) %>% summarise(avg_mpg = mean(mpg))
# A tibble: 6 × 3 # Groups: cyl [3] cyl am avg_mpg <dbl> <dbl> <dbl> 1 4 0 22.9 2 4 1 28.1 3 6 0 19.1 4 6 1 20.6 5 8 0 15.0 6 8 1 15.4
Note that the warning does not appear when using summarise
with this global option set to FALSE
.
Summary
Congratulations on reading to the end of this tutorial!
For further reading on R-related errors, go to the articles:
- How to Solve R Error as.Date.numeric(x) : ‘origin’ must be supplied
- How to Solve R Error: Incorrect number of dimensions
- How to Solve R Error in solve.default() Lapack routine dgesv: system is exactly singular
- How to Solve R Error in colMeans(x, na.rm = TRUE): ‘x’ Must be Numeric
Go to the online courses page on R to learn more about coding in R for data science and machine learning.
Have fun and happy researching!
Suf is a senior advisor in data science with deep expertise in Natural Language Processing, Complex Networks, and Anomaly Detection. Formerly a postdoctoral research fellow, he applied advanced physics techniques to tackle real-world, data-heavy industry challenges. Before that, he was a particle physicist at the ATLAS Experiment of the Large Hadron Collider. Now, he’s focused on bringing more fun and curiosity to the world of science and research online.