If you try to subset a data frame without using a comma, you will raise the error: undefined columns selected. The syntax for subsetting a data frame is:

dataframe[rows_to_subset, columns_to_subset]

To solve this error, you need to use a comma after the rows you want to subset, even if you want rows from all columns. For example,

data[data$col1>5, col1] ,

selects rows in column 1 with values greater than 5.

This tutorial will go through the error in detail and how to solve it with code examples.

Example: Error in data.frame undefined columns selected

Let’s look at an example with a data frame with two variables.

dat <- data.frame(x = c(0, 1, 2, 3, 4, 5),
y = c(11, 2, 5, 7, 9, 3))

  x  y
1 0 11
2 1  2
3 2  5
4 3  7
5 4  9
6 5  3

Let’s try to select the rows in column y that are greater than 5:

Error in `[.data.frame`(dat, dat$y > 5) : undefined columns selected

R raises the error because we did not use a comma after the row subset expression to inform R which columns we want to select.

Solution: Use a comma for the row and column expressions

We need to add a comma after the row subset expression to solve this error. Let’s look at the revised code:

dat[dat$y>5, "y"]

Note that we have to put the column name in quotes. Let’s run the code to see the result:

[1] 11  7  9

If we want to return values from all columns, we can leave the space after the comma blank.

dat[dat$y>5, ]
  x  y
1 0 11
4 3  7
5 4  9

If we know the total number of columns, we can use an equivalent command dat[dat$y>5, 1:2]. Let’s look at the revised code:

dat[dat$y>5, 1:2]
  x  y
1 0 11
4 3  7
5 4  9

We successfully retrieved the rows where at least one of the values is greater than five.


Congratulations on reading to the end of this tutorial!

For further reading on R related errors, go to the articles: 

Go to the online courses page on R to learn more about coding in R for data science and machine learning.

Have fun and happy researching!