If you want to use the call a function on the data frame or matrix column using apply(), you must use a data frame or matrix as the first argument. If you use a column of the data frame or matrix, you will raise the error: dim(X) must have a positive length.
You can solve this error by passing the dataframe as the argument to apply, for example,
apply(df, 2, sqrt)
This tutorial will go through the error in detail and how to solve it with code examples.
Apply in R
apply() function returns a vector, array, or list of values obtained by applying a function to the margins of an array or matrix. The syntax for
apply() is as follows:
apply(X, MARGIN, FUN, ...)
X: an array or matrix
MARGIN: a vector giving the subscripts to apply the function over. For a matrix, 1 indicates rows, 2 indicates columns, and
c(1, 2)indicates rows and columns. If
MARGINcan be a character vector selecting dimension names.
FUN: the function to apply.
...: Optional arguments to
Let’s look at an example of a data frame.
df <- data.frame(veg_sold=c(20, 40, 104, 75, 99, 10, 200), fruit_sold=c(30, 50, 80, 300, 100, 23, 10), cake_sold=c(10, 100, 500, 20,450, 100, 900)) df
veg_sold fruit_sold cake_sold 1 20 30 10 2 40 50 100 3 104 80 500 4 75 300 20 5 99 100 450 6 10 23 100 7 200 10 900
We want to calculate the average amount of cake sold. We attempt to use the
apply() function to calculate the mean value in the
apply(df$cake_sold, 2, mean)
2 as the second parameter to indicate we want to apply the function along the column. Let’s run the code to see the result:
Error in apply(df$cake_sold, 2, mean) : dim(X) must have a positive length
The error occurs because R expects a data frame or a matrix as the first argument of the
apply() function. Instead, we have provided a column. The dim() function is a built-in R function that either sets or returns the dimension of a matrix, array or data frame. A data frame column has a dimension of
Whereas the dimension of the data frame df is
 7 3
Hence why the error states
dim(X) must be a positive length.
Solution #1: Extract Column using c()
We can extract the column
cake_sold using the
c() function. Let’s look at the revised code:
In the above code, we subset the data frame to get the
cake_sold column, which has a dimension of
cake_sold 1 10 2 100 3 500 4 20 5 450 6 100 7 900  7 1
We can pass this array to the
apply() function to calculate the mean cake sold:
apply(df[c('cake_sold')], 2, mean)
Let’s run the code to get the result:
If we want to calculate the mean of specific columns, we can pass the column names to the
apply(df[c('cake_sold', 'veg_sold')], 2, mean)
Let’s run the code to see the result:
cake_sold veg_sold 297.14286 78.28571
Solution #2: Use function without apply()
Alternatively, we can use the
mean() function and pass
df$cake_sold as the argument without using
apply(). Let’s look at the revised code:
Let’s run the code to see the result:
Congratulations on reading to the end of this tutorial! Generally, this error occurs when you provide a vector as the first argument of the apply function instead of an array or matrix.
For further reading on R related errors, go to the articles:
- How to Solve R Error: $ operator is invalid for atomic vectors
- How to Solve R Error: Subscript out of bounds
Go to the online courses page on R to learn more about coding in R for data science and machine learning.
Have fun and happy researching!
Suf is a research scientist at Moogsoft, specializing in Natural Language Processing and Complex Networks. Previously he was a Postdoctoral Research Fellow in Data Science working on adaptations of cutting-edge physics analysis techniques to data-intensive problems in industry. In another life, he was an experimental particle physicist working on the ATLAS Experiment of the Large Hadron Collider. His passion is to share his experience as an academic moving into industry while continuing to pursue research. Find out more about the creator of the Research Scientist Pod here and sign up to the mailing list here!