This warning occurs when you use the
dcast function to convert a data frame from long to wide format, but more than one value can be placed in the individual output cells of the wide data frame. You can stop this warning from occurring by specifying the aggregate function argument when using
This tutorial will explain how to solve the R warning with code examples.
Let’s look at an example to reproduce the warning. First, let’s create a data frame containing the test scores of two students for two Chemistry and Physics exams.
#create data frame df <- data.frame(student=c('Alex', 'Alex', 'Alex', 'Alex', 'Bill', 'Bill', 'Bill', 'Bill'), subject=c('Chemistry', 'Chemistry', 'Physics', 'Physics', 'Chemistry', 'Chemistry', 'Physics', 'Physics'), test=c('Exam1', 'Exam2', 'Exam1', 'Exam2', 'Exam1', 'Exam2', 'Exam1', 'Exam2'), score=c(82, 78, 69, 80, 50, 61, 75, 82)) df
student subject test score 1 Alex Chemistry Exam1 82 2 Alex Chemistry Exam2 78 3 Alex Physics Exam1 69 4 Alex Physics Exam2 80 5 Bill Chemistry Exam1 50 6 Bill Chemistry Exam2 61 7 Bill Physics Exam1 75 8 Bill Physics Exam2 82
Next, we will install (if not installed previously) and load the reshape2 package, and then use the reshape
dcast function to cast the data frame from long to wide format.
install.packages('reshape2') library(reshape2) df_wide <- dcast(df, student ~ test, value.var="score")
Aggregation function missing: defaulting to length student Exam1 Exam2 1 Alex 2 2 2 Bill 2 2
We can see that the
dcast function works, but we receive a warning message from the R interpreter: Aggregation function missing: defaulting to length.
The warning occurs because there are multiple values we can use
score that can go into the output cells of
For example, for the student Alex and the test Exam1, the score could be 82 for Chemistry or 69 for Physics.
Because there is more than one value to choose from and the
fun.aggregate argument is not specified, the
dcast function defaults to using
length as the aggregate function. For example, we can see when using
length, that for student Alex and Exam1, there is a total of 2 scores.
We can use a different aggregate function, like
sd. In this example, it is suitable to use
mean, so that we can get the average score for the students across the different exams.
Let’s look at the revised code:
install.packages('reshape2') library(reshape2) df_wide <- dcast(df, student ~ test, value.var="score", fun.aggregate=mean) df_wide
student Exam1 Exam2 1 Alex 75.5 79.0 2 Bill 62.5 71.5
When we run the code with the
fun.aggregate argument specified, we do not receive the warning message.
We can interpret this as follows:
Given that both Physics and Chemistry have an Exam1 and an Exam2
- Student Alex has an average score of 75.5 for Exam1 and an average score of 79.0 for Exam2
- Student Bill has an average score of 62.5 for Exam1 and an average score of 71.5 for Exam2
Congratulations on reading to the end of this tutorial!
For further reading on R-related errors, go to the articles:
- How to Solve R Error: $ operator is invalid for atomic vectors
- How to Solve R Error: object of type ‘closure’ is not subsettable
- How to Solve R Error missing value where TRUE/FALSE needed
Go to the online courses page on R to learn more about coding in R for data science and machine learning.
Have fun and happy researching!