How to Solve R Error in colMeans(x, na.rm = TRUE): ‘x’ Must be Numeric

by | Programming, R, Tips

When working with R, the colMeans function is commonly used to calculate the mean of each column in a dataset. However, you might encounter the error:

Error in colMeans(x, na.rm = TRUE) : 'x' must be numeric

This error occurs when non-numeric data is passed to colMeans, which only operates on numeric data. Here, we will explore why this error happens, how to reproduce it, and the steps to resolve it.

Example to Reproduce the Error

Let’s use a dataset that mimics common scenarios using numeric and non-numeric columns.

# Simulating a dataset with mixed data types
employee_data <- data.frame(
  employee_id = c(1001, 1002, 1003, 1004),
  age = c(25, 30, 45, 29),
  salary = c(50000, 60000, 80000, 55000),
  department = c("HR", "Finance", "IT", "Marketing"),
  join_date = as.Date(c("2015-06-15", "2017-01-20", "2013-11-01", "2018-03-12")),
  stringsAsFactors = FALSE
)

# Attempting to calculate column means
colMeans(employee_data, na.rm = TRUE)

Output:

Error in colMeans(employee_data, na.rm = TRUE) : 'x' must be numeric

Here, the dataset contains numeric columns like age and salary, but also non-numeric columns such as employee_id, department, and join_date. Since colMeans expects numeric columns, the function throws an error when it encounters non-numeric data.

Solution: Filter Numeric Columns

To avoid this error, you should filter out non-numeric columns before applying the colMeans function. Below are two approaches you can take.

Solution 1: Automatically Select Numeric Columns

The most efficient solution is to use sapply to filter numeric columns from the dataset before passing them to colMeans.

# Filter only numeric columns
numeric_data <- employee_data[sapply(employee_data, is.numeric)]

# Calculate column means
col_means <- colMeans(numeric_data, na.rm = TRUE)
print(col_means)

Output:

     employee_id         age      salary 
1002.50000000   32.25000000  61250.00000000 

In this solution, sapply filters out non-numeric columns, allowing colMeans to operate only on numeric data.

Solution 2: Exclude Specific Non-Numeric Columns

If you know which columns are non-numeric, you can exclude them manually:

# Exclude specific non-numeric columns
numeric_data <- employee_data[, c("age", "salary")]

# Calculate column means
col_means <- colMeans(numeric_data, na.rm = TRUE)
print(col_means)

Output:

      age    salary 
  32.25  61250.00 

Conclusion

The error in colMeans(x, na.rm = TRUE) : 'x' must be numeric in R occurs when non-numeric columns are passed to the colMeans function. By either filtering for numeric columns using sapply or manually excluding non-numeric columns, you can easily resolve this issue and calculate the means of numeric columns.

Congratulations on reading to the end of this tutorial!

For further reading on R-related errors, go to the articles: 

Go to the online courses page on R to learn more about coding in R for data science and machine learning.

Have fun and happy researching!

Profile Picture
Senior Advisor, Data Science | [email protected] | + posts

Suf is a senior advisor in data science with deep expertise in Natural Language Processing, Complex Networks, and Anomaly Detection. Formerly a postdoctoral research fellow, he applied advanced physics techniques to tackle real-world, data-heavy industry challenges. Before that, he was a particle physicist at the ATLAS Experiment of the Large Hadron Collider. Now, he’s focused on bringing more fun and curiosity to the world of science and research online.

Buy Me a Coffee