The pipe operator, %>%
, is a special function available under the magrittr
package that allows you to pass the result of one function/argument to the other one in sequence. To use the pipe operator, you need to install and load the magrittr
package.
install.packages("magrittr") library(magrittr)
This tutorial will go through how to solve the error with a code example.
Example
The pipe operator allows us to pass an intermediate result onto the next function. Consider the following example of calling multiple mathematical functions on a numerical vector.
x <- c(0.1, 0.3, 0.6, 0.9, 0.5, 0.1, 0.01, 0.8, 0.9) x %>% log() %>% diff() %>% exp() %>% round(1)
In the above code, we compute the logarithm of x
and then pass the resultant vector to the diff function. We use the diff function to find the difference between two consecutive pairs in the vector and pass the result to the exp()
function. We then round the result of the exponential function call to one decimal place.
Let’s run the code to see what happens:
Error in x %>% log() %>% diff() %>% exp() %>% round(1) : could not find function "%>%"
The error occurs because we did not load the magrittr
package, which provides the pipe operator.
Solution
We can install and load the magrittr
package as follows.
install.packages("magrittr") library(magrittr)
Then we can perform the series of computations on the numerical vector and see the result.
x <- c(0.1, 0.3, 0.6, 0.9, 0.5, 0.1, 0.01, 0.8, 0.9) x %>% log() %>% diff() %>% exp() %>% round(1)
[1] 3.0 2.0 1.5 0.6 0.2 0.1 80.0 1.1
Example #2
Let’s look at a second example where we want to perform some data manipulation on the mtcars dataset. Specifically, we want to filter out cars with number of carburettors (carb
) less than 1, then group by number of cylinders (cyl
), then create a new data frame with the average miles per gallon (mpg
) for each number of cylinders.
mtcars %>% filter(carb > 1)%>% group_by(cyl) %>% summarize(Avg_mpg = mean(mpg))
Let’s run the code to see what happens:
Error in mtcars %>% filter(carb > 1) %>% group_by(cyl) %>% summarize(Avg_mpg = mean(mpg)) : could not find function "%>%"
We can solve the pipe operator error by loading the magrittr
package. Let’s look at the revised code:
library(magrittr) mtcars %>% filter(carb > 1)%>% group_by(cyl) %>% summarize(Avg_mpg = mean(mpg))
Let’s run the code to see what happens:
Error in summarize(., Avg_mpg = mean(mpg)) : could not find function "summarize"
The pipe operator error disappears, but we have a new error stating R could not find the function “summarize
“.
Solution #1: Load dplyr
We can solve the missing pipe operator and summarize function errors by installing and loading the dplyr
. The dplyr package provides a consistent set of verbs to help solve the most common data manipulation challenges. One of the verbs that dplyr
provides is summarize
/summarise
. Let’s look at the revised code:
install.packages("dplyr") library(dplyr) mtcars %>% filter(carb > 1)%>% group_by(cyl) %>% summarize(Avg_mpg = mean(mpg))
Let’s run the code to see the resultant data frame.
# A tibble: 3 × 2 cyl Avg_mpg <dbl> <dbl> 1 4 25.9 2 6 19.7 3 8 15.1
Solution #2: Load tidyverse
We can also solve this error by installing and loading tidyverse
, which provides a set of packages for data science, including dplyr and margrittr
. Loading tidyverse
is preferable if the code’s focus is to manipulate, explore and visualize data.
Let’s look at the revised code:
install.packages("tidyverse") library(tidyverse) mtcars %>% filter(carb > 1)%>% group_by(cyl) %>% summarize(Avg_mpg = mean(mpg))
Let’s look at the modified code:
── Attaching packages ─────────────────────────────────────────────── tidyverse 1.3.1 ── ✔ ggplot2 3.3.6 ✔ purrr 0.3.4 ✔ tibble 3.1.6 ✔ dplyr 1.0.9 ✔ tidyr 1.2.0 ✔ stringr 1.4.0 ✔ readr 2.1.2 ✔ forcats 0.5.1 ── Conflicts ────────────────────────────────────────────────── tidyverse_conflicts() ── ✖ dplyr::filter() masks stats::filter() ✖ dplyr::lag() masks stats::lag() # A tibble: 3 × 2 cyl Avg_mpg <dbl> <dbl> 1 4 25.9 2 6 19.7 3 8 15.1
We successfully retrieved the data frame containing average miles per gallon for the different number of cylinders.
Note that when we load tidyverse
, R tells us which packages it is attaching at which conflicts arise. We can see that the dplyr filter and lag functions mask the stats functions filter and lag functions.
Summary
Congratulations on reading to the end of this tutorial!
Go to the online courses page on R to learn more about coding in R for data science and machine learning.
Have fun and happy researching!
Suf is a senior advisor in data science with deep expertise in Natural Language Processing, Complex Networks, and Anomaly Detection. Formerly a postdoctoral research fellow, he applied advanced physics techniques to tackle real-world, data-heavy industry challenges. Before that, he was a particle physicist at the ATLAS Experiment of the Large Hadron Collider. Now, he’s focused on bringing more fun and curiosity to the world of science and research online.