This error occurs when you try to import a dataset into R, and there is data missing in the file.
You can solve this error by checking for special characters, ensuring that you have the correct number of headings, or by using the fill argument when reading the file.
This tutorial will go through how to solve the error with code examples.
Example #1
Let’s look at an example to learn how to solve the error. Consider a text file containing the scores of three players playing three games, where a comma separates each column.
game1,game2,game3 3,4 3,5,7 1,2,4
We can attempt to read the data into a DataFrame using the read.table
method as follows:
df <- read.table('scores.txt', header=TRUE, sep=',')
Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, : line 1 did not have 3 elements
We get the error because the first row of the data set only has two values, whereas R expects each row to have three values.
Solution #1
We can solve the error by using the fill argument in the read.table
function. The fill argument will replace missing values with NA
when set to TRUE
. Let’s look at the revised code:
df <- read.table('scores.txt', header=TRUE, sep=',', fill=TRUE) df
Let’s run the code to see the resultant DataFrame.
game1 game2 game3 1 3 4 NA 2 3 5 7 3 1 2 4
We successfully read the data set and filled the missing value with an NA.
Solution #2
We can also solve this error by manually editing the text file. This approach may not be suitable if you handle a data set with many missing values. However, it can be quick and easy for small data sets if you know what the missing value should be.
Let’s look at the revised data set:
game1,game2,game3 3,4,5 3,5,7 1,2,4
We can correctly read the data set with each row having three values.
df <- read.table('scores.txt', header=TRUE, sep=',')
game1 game2 game3 1 3 4 5 2 3 5 7 3 1 2 4
Example #2
Another common source of the error is having unexpected characters in place of the separating character. Let’s look at an example where instead of a comma separating two values, we have a hash #
.
game1,game2,game3 3,4#5 3,5,7 1,2,4
Let’s attempt to read the data.
df <- read.table('scores.txt', header=TRUE, sep=',')
Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, : line 1 did not have 3 elements
When we try to read the data set using the comma separator, the R interpreter treats 4#5
as a single value. Therefore row 1 only has two elements.
Solution
We can solve the error by replacing non-comma characters with a comma using the gsub
function.
Let’s look at the revised code:
x <- readLines('scores.txt') y <- gsub('(?<!^)#', ',', x, perl = TRUE) y
We set perl=TRUE
to use Perl-style regular expressions.
[1] "game1,game2,game3" "3,4,5" "3,5,7" [4] "1,2,4"
We replaced the hash with a comma, now we can read the data into a DataFrame using read.table
as follows:
df <- read.table(text=y, header=TRUE, sep=',') df
game1 game2 game3 1 3 4 5 2 3 5 7 3 1 2 4
There may be cases where we encounter special characters but not in the place of the separator. For example
game1,game2,game3 3?,4,5 3,5#,7 1,2<,4
We have non-numeric characters next to some of our scores in this case. We can use the gsub
function to remove these characters as follows:
x <- readLines('scores.txt') y <- gsub('([?#<])', '', x, perl = TRUE) y
[1] "game1,game2,game3" "3,4,5" "3,5,7" [4] "1,2,4"
We removed the special characters, now we can read the data into a DataFrame using read.table
as follows:
df <- read.table(text=y, header=TRUE, sep=',') df
game1 game2 game3 1 3 4 5 2 3 5 7 3 1 2 4
You can add more offending characters you want to remove to the pattern argument of the gsub
function.
Summary
Congratulations on reading to the end of this tutorial!
For further reading on R-related errors, go to the articles:
- How to Solve R Error as.Date.numeric(x) : ‘origin’ must be supplied
- How to Solve R Error: Incorrect number of dimensions
- How to Solve R Warning: `summarise()` has grouped output by ‘X’. You can override using the `.groups` argument
- How to Solve R Error: ggplot2 doesn’t know how to deal with data of class character
Go to the online courses page on R to learn more about coding in R for data science and machine learning.
Have fun and happy researching!
Suf is a senior advisor in data science with deep expertise in Natural Language Processing, Complex Networks, and Anomaly Detection. Formerly a postdoctoral research fellow, he applied advanced physics techniques to tackle real-world, data-heavy industry challenges. Before that, he was a particle physicist at the ATLAS Experiment of the Large Hadron Collider. Now, he’s focused on bringing more fun and curiosity to the world of science and research online.