*This tutorial will go through counting the number of missing values or NAs in a data frame in R.*

## Example

Let’s look at an example using built-in data `airquality`

.

### Get Airquality Data

First, let’s look at the head of the `airquality`

dataset.

head(airquality)

Ozone Solar.R Wind Temp Month Day 1 41 190 7.4 67 5 1 2 36 118 8.0 72 5 2 3 12 149 12.6 74 5 3 4 18 313 11.5 62 5 4 5 NA NA 14.3 56 5 5 6 28 NA 14.9 66 5 6

We can see that there are `NA`

values in the data frame, but we need to determine how many there are.

### Solution #1: Use summary

The simplest way to get the number of `NAs`

in the data frame is to use the `summary`

method. Let’s look at the implementation of `summary`

:

summary(airquality)

Ozone Solar.R Wind Temp Month Min. : 1.00 Min. : 7.0 Min. : 1.700 Min. :56.00 Min. :5.000 1st Qu.: 18.00 1st Qu.:115.8 1st Qu.: 7.400 1st Qu.:72.00 1st Qu.:6.000 Median : 31.50 Median :205.0 Median : 9.700 Median :79.00 Median :7.000 Mean : 42.13 Mean :185.9 Mean : 9.958 Mean :77.88 Mean :6.993 3rd Qu.: 63.25 3rd Qu.:258.8 3rd Qu.:11.500 3rd Qu.:85.00 3rd Qu.:8.000 Max. :168.00 Max. :334.0 Max. :20.700 Max. :97.00 Max. :9.000 NA's :37 NA's :7 Day Min. : 1.0 1st Qu.: 8.0 Median :16.0 Mean :15.8 3rd Qu.:23.0 Max. :31.0

The summary method returns statistical summaries of each column in the data frame and the `NAs`

in each column. We can see there are 37 `NA`

values in `Ozone`

and 7 `NA`

values in `Solar.R`

.

### Solution #2: Use sum and is.na

The second way we can get the total number of `NAs`

in the data frame is to call `is.na`

which returns `TRUE`

or `FALSE`

for each value in a data set and `sum()`

sums up the `TRUE`

values. Let’s look at the code:

sum(is.na(airquality))

Let’s run the code to see the result:

[1] 44

There is a total of 44 `NAs`

in the data frame.

### Solution #3: Use sum and is.na in Function

If we want to get the number of `NAs`

per column in a data frame we can define a function to iterate over each column and count the `NAs`

using `sum()`

and `is.na`

. Let’s look at the code:

res <- NULL f <- function(x) { for (i in 1:ncol(x)){ temp<-sum(is.na(x[,i])) temp<-as.data.frame(temp) temp$var<colnames(x)[i] res<-rbind(res,temp) } return(res) }

Let’s call the function to see the result:

f(airquality)

temp 1 37 2 7 3 0 4 0 5 0 6 0

There are 37 `NAs`

in the first column and 7 `NAs`

in the second column.

## Summary

