How to Find the Column Name with the Largest Value for Each Row using R

by | Programming, R, Tips

You can find the column name with the largest value across all rows using the colnames() function together with the apply function.

For example,

df$largest_col <-colnames(df)[apply(df, 1, which.max)]

This tutorial will go through how to perform this task with code examples.


Table of contents

Example

Let’s look at an example. First, we will define a data frame with three columns and ten rows with random integers between 10 and 1000.

x <- sample(10:1000, size = 10)

y <- sample(10:1000, size = 10)

z <- sample(10:1000, size = 10)

df <- data.frame(x,y,z)

df

Let’s run the code to see the data frame.

    x   y   z
1  646 787 662
2  263 690 515
3  984 187 153
4   27 106 814
5  672 225 658
6  289 439 458
7  543 611 526
8  899 272 159
9  701 370 882
10 274 885 564

We can find the column name with the largest value for each row in the data frame using the colnames() function combined with the apply() function. The colnames() function obtains or sets the names of columns in a matrix-like object. The apply() function applies a function across an array matrix or data frame. The syntax for the apply() function is

# Syntax

apply(X, # Array, matrix or data frame

MARGIN, # 1: rows, 2: columns, c(1,2): rows and columns

FUN, # Function to apply

...) # Additional arguments to fun

We can apply a function to every row of a data frame by setting 1 for the MARGIN argument.

We want to apply the which.max function to every row to get the column that has the largest value.

We use that column index to get the column name using colnames(df).

df$largest_column<-colnames(df)[apply(df,1,which.max)]

df

Let’s run the code to get the result:

     x   y   z largest_column
1  646 787 662              y
2  263 690 515              y
3  984 187 153              x
4   27 106 814              z
5  672 225 658              x
6  289 439 458              z
7  543 611 526              y
8  899 272 159              x
9  701 370 882              z
10 274 885 564              y

We successfully updated the data frame with a “largest_column” column that contains the column name with the largest value for each row.

Summary

Congratulations on reading to the end of this tutorial!

Go to the online courses page on R to learn more about coding in R for data science and machine learning.

Have fun and happy researching!

Research Scientist at Moogsoft | + posts

Suf is a research scientist at Moogsoft, specializing in Natural Language Processing and Complex Networks. Previously he was a Postdoctoral Research Fellow in Data Science working on adaptations of cutting-edge physics analysis techniques to data-intensive problems in industry. In another life, he was an experimental particle physicist working on the ATLAS Experiment of the Large Hadron Collider. His passion is to share his experience as an academic moving into industry while continuing to pursue research. Find out more about the creator of the Research Scientist Pod here and sign up to the mailing list here!