Select Page

How to Solve R Error: Discrete Value Supplied to Continuous Scale

by | Programming, R, Tips

This error occurs you try to set limits on the y-axis using scale_y_continuous() and the y variable is not numeric. This error can happen if you use character or factor type for the y variable in your data. You can solve this error by using numeric values instead of character or factor. Alternatively, you can subset the data frame before plotting the data.

This tutorial will go through the error in detail and how to solve it with code examples.


Example

Consider the following data frame that consists of two variables. The first variable contains 100 numeric values, which are samples from the normal distribution. The second variable consists of four categories. Let’s look at the data frame:

set.seed(0)
data <- data.frame(x = rnorm(100),          # Create example data
                   y = c("0-18", "18-35", "35-45", "45+"))
head(data)
           x     y
1  1.2629543  0-18
2 -0.3262334 18-35
3  1.3297993 35-45
4  1.2724293   45+
5  0.4146414  0-18
6 -1.5399500 18-35

Let’s attempt to plot the data using ggplot2.

library("ggplot2") # load ggplot2 package


ggplot(data, aes(x, y)) +             
  geom_point()
Categorical scatter plot
Categorical scatter plot

We successfully created a ggplot2 graph showing the example data.

Next, we will try to change the limits on the y-axis using scale_y_continuous(). We want to exclude values that are in the 45+ category. Let’s look at the code:

ggplot(data, aes(x, y)) +             
  geom_point() + 
  scale_y_continuous(limits = c(0, 45))

Let’s run the code to see what happens:

Error: Discrete value supplied to continuous scale

The error occurs because the y-variable is discrete, and scale_y_continuous() expects a numeric variable to limit.

Solution #1

We can solve this error by preprocessing our data. We can use subsetting to exclude the values that are in the 45+ category. Let’s look at the additional code:

data_new <- data[data$y != "45+", ] 

Next, we will plot the data without using scale_y_continuous. Let’s look at the revised code:

ggplot(data_new, aes(x, y)) +                
     geom_point()

Let’s run the code to get the result:

Categorical scatter plot with one category excluded
Categorical scatter plot with one category excluded

We successfully plotted the data with the values in the 45+ category excluded.

Solution #2

We can also solve this error by using numeric instead of character as the y variable. Let’s look at the revised code:

library("ggplot2")

set.seed(0)
data <- data.frame(x = rnorm(100),          
                   y = c(1:4))
head(data)

In the above code, we use a vector of numeric values from 1 to 4 instead of a vector of character values. Let’s look at the data frame:

        x y
1  1.2629543 1
2 -0.3262334 2
3  1.3297993 3
4  1.2724293 4
5  0.4146414 1
6 -1.5399500 2

We can now use scale_y_continuous to exclude values that have a y value of 4, which is equivalent to excluding values from the previous data frame with a y value of 45+.

ggplot(data, aes(x, y)) +             
     geom_point() + 
     scale_y_continuous(limits = c(1, 3))

Let’s run the code to get the result:

Categorical scatter plot with one numeric value excluded
Categorical scatter plot with one numeric value excluded

Summary

Congratulations on reading to the end of this tutorial!

For further reading on R related errors, go to the articles: 

Go to the online courses page on R to learn more about coding in R for data science and machine learning.

Have fun and happy researching!