Select Page

How to Solve R Warning: `summarise()` has grouped output by ‘X’. You can override using the `.groups` argument

by | Programming, R, Tips

This R warning occurs when you have more than one column in group_by when using the dplyr::summarise().

The summarise function has a .groups argument with a default value of ‘drop_last’. If you set the .groups argument manually, the warning will not appear.

It is only a warning message and does not affect the final output. This tutorial will show you how to reproduce the warning and stop it from occurring.


Example

Let’s look at an example to reproduce the warning. We will load the dplyr package and then attempt to group the mtcars dataset by the number of cylinders (cyl) and automatic/manual (am)

library(dplyr)
mtcars %>%
     group_by(cyl, am) %>% 
     summarise(avg_mpg = mean(mpg))
`summarise()` has grouped output by 'cyl'. You can override using the `.groups`
argument.
# A tibble: 6 × 3
# Groups:   cyl [3]
    cyl    am avg_mpg
  <dbl> <dbl>   <dbl>
1     4     0    22.9
2     4     1    28.1
3     6     0    19.1
4     6     1    20.6
5     8     0    15.0
6     8     1    15.4

The summarise() function without the .groups argument set will strip the last grouping key by default, returning a grouped data frame if multiple grouping columns exist.

The warning tells us that only the first of the original grouping keys were preserved using the default .groups = 'drop_last'.

Although the am grouping key was dropped, both cyl and am are defined.

Solution #1: Set .groups argument

If we check the documentation for the summarise function, we can see the .groups argument and the possible settings for it.

?summarise

If we set the .groups argument in summarise, the warning message will not appear. For example:

library(dplyr)
mtcars %>%
     group_by(cyl, am) %>% 
     summarise(avg_mpg = mean(mpg), .groups='drop_last')
# A tibble: 6 × 3
# Groups:   cyl [3]
    cyl    am avg_mpg
  <dbl> <dbl>   <dbl>
1     4     0    22.9
2     4     1    28.1
3     6     0    19.1
4     6     1    20.6
5     8     0    15.0
6     8     1    15.4

We can see the warning was not issued. This warning occurs because we may want to do a mutation on the data frame with the assumption that there is no grouping, which would produce unexpected output. This warning tells us how the grouping attribute is being handled by summarise by default.

Let’s look at what happens when we use .groups='keep':

library(dplyr)
mtcars %>%
     group_by(cyl, am) %>% 
     summarise(avg_mpg = mean(mpg), .groups='keep')
# A tibble: 6 × 3
# Groups:   cyl, am [6]
    cyl    am avg_mpg
  <dbl> <dbl>   <dbl>
1     4     0    22.9
2     4     1    28.1
3     6     0    19.1
4     6     1    20.6
5     8     0    15.0
6     8     1    15.4

The R interpreter tells us that both grouping attributes were kept.

Solution #2: use options(dplyr.summarise.inform=FALSE)

If you do not plan to change the grouping attributes to drop when using summarise from the default of ‘.drop_last‘, you can turn off the warning using options(dplyr.summarise.inform=FALSE). Let’s look at the revised example:

library(dplyr)
options(dplyr.summarise.inform=FALSE)
mtcars %>%
     group_by(cyl, am) %>% 
     summarise(avg_mpg = mean(mpg))
# A tibble: 6 × 3
# Groups:   cyl [3]
    cyl    am avg_mpg
  <dbl> <dbl>   <dbl>
1     4     0    22.9
2     4     1    28.1
3     6     0    19.1
4     6     1    20.6
5     8     0    15.0
6     8     1    15.4

Note that the warning does not appear when using summarise with this global option set to FALSE.

Summary

Congratulations on reading to the end of this tutorial!

For further reading on R-related errors, go to the articles: 

Go to the online courses page on R to learn more about coding in R for data science and machine learning.

Have fun and happy researching!