This tutorial will go through counting the number of missing values or NAs in a data frame in R.


Example

Let’s look at an example using built-in data airquality.

Get Airquality Data

First, let’s look at the head of the airquality dataset.

head(airquality)
 Ozone Solar.R Wind Temp Month Day
1    41     190  7.4   67     5   1
2    36     118  8.0   72     5   2
3    12     149 12.6   74     5   3
4    18     313 11.5   62     5   4
5    NA      NA 14.3   56     5   5
6    28      NA 14.9   66     5   6

We can see that there are NA values in the data frame, but we need to determine how many there are.

Solution #1: Use summary

The simplest way to get the number of NAs in the data frame is to use the summary method. Let’s look at the implementation of summary:

summary(airquality)
   Ozone           Solar.R           Wind             Temp           Month      
 Min.   :  1.00   Min.   :  7.0   Min.   : 1.700   Min.   :56.00   Min.   :5.000  
 1st Qu.: 18.00   1st Qu.:115.8   1st Qu.: 7.400   1st Qu.:72.00   1st Qu.:6.000  
 Median : 31.50   Median :205.0   Median : 9.700   Median :79.00   Median :7.000  
 Mean   : 42.13   Mean   :185.9   Mean   : 9.958   Mean   :77.88   Mean   :6.993  
 3rd Qu.: 63.25   3rd Qu.:258.8   3rd Qu.:11.500   3rd Qu.:85.00   3rd Qu.:8.000  
 Max.   :168.00   Max.   :334.0   Max.   :20.700   Max.   :97.00   Max.   :9.000  
 NA's   :37       NA's   :7                                                       
      Day      
 Min.   : 1.0  
 1st Qu.: 8.0  
 Median :16.0  
 Mean   :15.8  
 3rd Qu.:23.0  
 Max.   :31.0  
              

The summary method returns statistical summaries of each column in the data frame and the NAs in each column. We can see there are 37 NA values in Ozone and 7 NA values in Solar.R.

Solution #2: Use sum and is.na

The second way we can get the total number of NAs in the data frame is to call is.na which returns TRUE or FALSE for each value in a data set and sum() sums up the TRUE values. Let’s look at the code:

sum(is.na(airquality))

Let’s run the code to see the result:

[1] 44

There is a total of 44 NAs in the data frame.

Solution #3: Use sum and is.na in Function

If we want to get the number of NAs per column in a data frame we can define a function to iterate over each column and count the NAs using sum() and is.na. Let’s look at the code:

res <- NULL

f <- function(x) { 

  for (i in 1:ncol(x)){

  temp<-sum(is.na(x[,i]))

  temp<-as.data.frame(temp)

  temp$var<colnames(x)[i]

  res<-rbind(res,temp)

 }

 return(res)

}

Let’s call the function to see the result:

f(airquality)
  temp
1   37
2    7
3    0
4    0
5    0
6    0

There are 37 NAs in the first column and 7 NAs in the second column.

Summary

Congratulations on reading to the end of this tutorial!

Go to the online courses page on R to learn more about coding in R for data science and machine learning.

For further reading on data analysis with R, go to the article: How to Download and Plot Stock Prices with quantmod in R

Have fun and happy researching!