How to Solve R Error in scan: Line 1 did not have X elements

by | Programming, R, Tips

This error occurs when you try to import a dataset into R, and there is data missing in the file.

You can solve this error by checking for special characters, ensuring that you have the correct number of headings, or by using the fill argument when reading the file.

This tutorial will go through how to solve the error with code examples.


Example #1

Let’s look at an example to learn how to solve the error. Consider a text file containing the scores of three players playing three games, where a comma separates each column.

game1,game2,game3
3,4
3,5,7
1,2,4

We can attempt to read the data into a DataFrame using the read.table method as follows:

df <- read.table('scores.txt', header=TRUE, sep=',')
Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec,  : 
  line 1 did not have 3 elements

We get the error because the first row of the data set only has two values, whereas R expects each row to have three values.

Solution #1

We can solve the error by using the fill argument in the read.table function. The fill argument will replace missing values with NA when set to TRUE. Let’s look at the revised code:

df <- read.table('scores.txt', header=TRUE, sep=',', fill=TRUE)
df

Let’s run the code to see the resultant DataFrame.

 game1 game2 game3
1     3     4    NA
2     3     5     7
3     1     2     4

We successfully read the data set and filled the missing value with an NA.

Solution #2

We can also solve this error by manually editing the text file. This approach may not be suitable if you handle a data set with many missing values. However, it can be quick and easy for small data sets if you know what the missing value should be.

Let’s look at the revised data set:

game1,game2,game3
3,4,5
3,5,7
1,2,4

We can correctly read the data set with each row having three values.

df <- read.table('scores.txt', header=TRUE, sep=',')
game1 game2 game3
1     3     4     5
2     3     5     7
3     1     2     4

Example #2

Another common source of the error is having unexpected characters in place of the separating character. Let’s look at an example where instead of a comma separating two values, we have a hash #.

game1,game2,game3
3,4#5
3,5,7
1,2,4

Let’s attempt to read the data.

df <- read.table('scores.txt', header=TRUE, sep=',')
Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec,  : 
  line 1 did not have 3 elements

When we try to read the data set using the comma separator, the R interpreter treats 4#5 as a single value. Therefore row 1 only has two elements.

Solution

We can solve the error by replacing non-comma characters with a comma using the gsub function.

Let’s look at the revised code:

x <- readLines('scores.txt')
y <- gsub('(?<!^)#', ',', x, perl = TRUE)
y

We set perl=TRUE to use Perl-style regular expressions.

[1] "game1,game2,game3" "3,4,5"             "3,5,7"            
[4] "1,2,4"  

We replaced the hash with a comma, now we can read the data into a DataFrame using read.table as follows:

df <- read.table(text=y, header=TRUE, sep=',')

df
 game1 game2 game3
1     3     4     5
2     3     5     7
3     1     2     4

There may be cases where we encounter special characters but not in the place of the separator. For example

game1,game2,game3
3?,4,5
3,5#,7
1,2<,4

We have non-numeric characters next to some of our scores in this case. We can use the gsub function to remove these characters as follows:

x <- readLines('scores.txt')
y <- gsub('([?#<])', '', x, perl = TRUE)
y
[1] "game1,game2,game3" "3,4,5"             "3,5,7"            
[4] "1,2,4"

We removed the special characters, now we can read the data into a DataFrame using read.table as follows:

df <- read.table(text=y, header=TRUE, sep=',')

df
  game1 game2 game3
1     3     4     5
2     3     5     7
3     1     2     4

You can add more offending characters you want to remove to the pattern argument of the gsub function.

Summary

Congratulations on reading to the end of this tutorial!

For further reading on R-related errors, go to the articles: 

Go to the online courses page on R to learn more about coding in R for data science and machine learning.

Have fun and happy researching!

Profile Picture
Senior Advisor, Data Science | [email protected] | + posts

Suf is a senior advisor in data science with deep expertise in Natural Language Processing, Complex Networks, and Anomaly Detection. Formerly a postdoctoral research fellow, he applied advanced physics techniques to tackle real-world, data-heavy industry challenges. Before that, he was a particle physicist at the ATLAS Experiment of the Large Hadron Collider. Now, he’s focused on bringing more fun and curiosity to the world of science and research online.

Buy Me a Coffee