*This tutorial will go through counting the number of missing values or NAs in a data frame in R.*

## Table of contents

## Example

Let’s look at an example using built-in data `airquality`

.

### Get Airquality Data

First, let’s look at the head of the `airquality`

dataset.

head(airquality)

Ozone Solar.R Wind Temp Month Day 1 41 190 7.4 67 5 1 2 36 118 8.0 72 5 2 3 12 149 12.6 74 5 3 4 18 313 11.5 62 5 4 5 NA NA 14.3 56 5 5 6 28 NA 14.9 66 5 6

We can see that there are `NA`

values in the data frame, but we need to determine how many there are.

### Solution #1: Use summary

The simplest way to get the number of `NAs`

in the data frame is to use the `summary`

method. Let’s look at the implementation of `summary`

:

summary(airquality)

Ozone Solar.R Wind Temp Month Min. : 1.00 Min. : 7.0 Min. : 1.700 Min. :56.00 Min. :5.000 1st Qu.: 18.00 1st Qu.:115.8 1st Qu.: 7.400 1st Qu.:72.00 1st Qu.:6.000 Median : 31.50 Median :205.0 Median : 9.700 Median :79.00 Median :7.000 Mean : 42.13 Mean :185.9 Mean : 9.958 Mean :77.88 Mean :6.993 3rd Qu.: 63.25 3rd Qu.:258.8 3rd Qu.:11.500 3rd Qu.:85.00 3rd Qu.:8.000 Max. :168.00 Max. :334.0 Max. :20.700 Max. :97.00 Max. :9.000 NA's :37 NA's :7 Day Min. : 1.0 1st Qu.: 8.0 Median :16.0 Mean :15.8 3rd Qu.:23.0 Max. :31.0

The summary method returns statistical summaries of each column in the data frame and the `NAs`

in each column. We can see there are 37 `NA`

values in `Ozone`

and 7 `NA`

values in `Solar.R`

.

### Solution #2: Use sum and is.na

The second way we can get the total number of `NAs`

in the data frame is to call `is.na`

which returns `TRUE`

or `FALSE`

for each value in a data set and `sum()`

sums up the `TRUE`

values. Let’s look at the code:

sum(is.na(airquality))

Let’s run the code to see the result:

[1] 44

There is a total of 44 `NAs`

in the data frame.

### Solution #3: Use sum and is.na in Function

If we want to get the number of `NAs`

per column in a data frame we can define a function to iterate over each column and count the `NAs`

using `sum()`

and `is.na`

. Let’s look at the code:

res <- NULL f <- function(x) { for (i in 1:ncol(x)){ temp<-sum(is.na(x[,i])) temp<-as.data.frame(temp) temp$var<colnames(x)[i] res<-rbind(res,temp) } return(res) }

Let’s call the function to see the result:

f(airquality)

temp 1 37 2 7 3 0 4 0 5 0 6 0

There are 37 `NAs`

in the first column and 7 `NAs`

in the second column.

## Summary

Congratulations on reading to the end of this tutorial!

Go to theÂ online courses page on RÂ to learn more about coding in R for data science and machine learning.

For further reading on data analysis with R, go to the article: How to Download and Plot Stock Prices with quantmod in R

Have fun and happy researching!

Suf is a research scientist at Moogsoft, specializing in Natural Language Processing and Complex Networks. Previously he was a Postdoctoral Research Fellow in Data Science working on adaptations of cutting-edge physics analysis techniques to data-intensive problems in industry. In another life, he was an experimental particle physicist working on the ATLAS Experiment of the Large Hadron Collider. His passion is to share his experience as an academic moving into industry while continuing to pursue research. Find out more about the creator of the Research Scientist Pod here and sign up to the mailing list here!