Using the rowwise
function from dplyr in combination with the mutation function, you can apply a function to every row in a table. For example,
data %>% rowwise() %>% # a, b, c are column names mutate(sum_val = sum(a, b, c))
This tutorial will go through how to use dplyr to apply a row-wise function to a table in R with code examples.
Table of contents
Example #1: Sum of Rows
Let’s look at an example to calculate the row-wise mean of data frame using dplyr. First, we need to install and load dplyr if we do not already have it.
install.packages("dplyr") library("dplyr")
Next, we will create the data frame:
data <- data.frame(a=c(2, 4, 6, 8, 10), b=c(3, 5, 7, 9, 11), c=c(4, 16, 64, 256, 1024)) data
a b c 1 2 3 4 2 4 5 16 3 6 7 64 4 8 9 256 5 10 11 1024
Once we have the data frame we can use the following syntax to apply the sum() function to each row in a data frame.
data %>% rowwise() %>% mutate(sum_cols = sum(a, b, c))
Let’s run the code to get the result:
# A tibble: 5 × 4 # Rowwise: a b c sum_cols <dbl> <dbl> <dbl> <dbl> 1 2 3 4 9 2 4 5 16 25 3 6 7 64 77 4 8 9 256 273 5 10 11 1024 1045
Example #2: Mean of Rows
Let’s look at another example of using the rowwise function with mutate to calulate the mean of every row in a data frame.
data <- data.frame(a=c(2, 4, 6, 8, 10), b=c(3, 5, 7, 9, 11), c=c(4, 16, 64, 256, 1024)) data %>% rowwise() %>% mutate(mean_cols = mean(c(a, b, c)))
Let’s run the code to get the result:
# A tibble: 5 × 4 # Rowwise: a b c mean_cols <dbl> <dbl> <dbl> <dbl> 1 2 3 4 3 2 4 5 16 8.33 3 6 7 64 25.7 4 8 9 256 91 5 10 11 1024 348.
Example #3: Standard Deviation of Rows
Let’s look at another example of using the rowwise function with mutate to calulate the standard devation of every row in a data frame. We will make the example a bit trickier by including NA values in the data frame. We can use na.rm to remove the NA values when calculating the standard deviation.
data <- tibble::as_tibble(data.frame(a=c(NA, 4, 6, 8, 10), b=c(3, 5, 7, 9, NA), c=c(4, 16, NA, 256, 1024))) data %>% rowwise() %>% mutate(std_cols = sd(c(a, b, c), na.rm=TRUE))
Let’s run the code to get the result:
# A tibble: 5 × 4 # Rowwise: a b c std_cols <dbl> <dbl> <dbl> <dbl> 1 NA 3 4 0.707 2 4 5 16 6.66 3 6 7 NA 0.707 4 8 9 256 143. 5 10 NA 1024 717.
Summary
Congratulations on reading to the end of this tutorial!
For further reading on R go to the articles:
Go to the online courses page on R to learn more about coding in R for data science and machine learning.
Have fun and happy researching!
Suf is a senior advisor in data science with deep expertise in Natural Language Processing, Complex Networks, and Anomaly Detection. Formerly a postdoctoral research fellow, he applied advanced physics techniques to tackle real-world, data-heavy industry challenges. Before that, he was a particle physicist at the ATLAS Experiment of the Large Hadron Collider. Now, he’s focused on bringing more fun and curiosity to the world of science and research online.