There are a number of ways in R to count NAs (missing values). A common use case is to count the NAs over multiple columns, ie., a whole dataframe. That’s basically the question “how many NAs are there in each column of my dataframe”? This post demonstrates some ways to answer this question.
Way 1: using sapply
A typical way (or classical way) in R to achieve some iteration is using apply
and friends. sapply
renders through a list and simplifies (hence the “s” in sapply) if possible.
sapply(mtcars, function(x) sum(is.na(x)))
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> 0 0 0 0 0 0 0 0 0 0 0
Pros: Straightforward. No dependencies on other packages. Tried and true.
Cons: Not typestable; not sure you will always get the same data type back from this function. You might be surprised and get something you did not expect. That’s no problem in interactive use, but you’d not want for programming.
Way 2: using purrr::map
map
maps (applies) a function to each element of a vector/list. Here, the following code reads as “Apply ‘sum(is.na(.))’ on each column of mtcars”. Mind the tilde ~
before function. The dot
.` refers to the respective column.
library(tidyverse)
map(mtcars, ~sum(is.na(.)))
#> $mpg
#> [1] 0
#>
#> $cyl
#> [1] 0
#>
#> $disp
#> [1] 0
#>
#> $hp
#> [1] 0
#>
#> $drat
#> [1] 0
#>
#> $wt
#> [1] 0
#>
#> $qsec
#> [1] 0
#>
#> $vs
#> [1] 0
#>
#> $am
#> [1] 0
#>
#> $gear
#> [1] 0
#>
#> $carb
#> [1] 0
Pros: Modern, cool. Straightforward. Typestable. Complex function with lots of use cases.
Cons: Learning curve. Depends on package tidyverse
.
Way 3: using dplyr
The following code can be translated as something like this:
1. Hey R, take mtcars -and then-
2. Select all columns (if I'm in a good mood tomorrow, I might select fewer) -and then-
3. Summarise all selected columns by using the function 'sum(is.na(.))'
The dot .
refers to what was handed over by the pipe, ie., the output of the last step.
mtcars %>%
select(everything()) %>% # replace to your needs
summarise_all(funs(sum(is.na(.))))
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> 1 0 0 0 0 0 0 0 0 0 0 0
Pros: Straightforward. Quite simple (simple than map
to me).
Cons: Depends on package tidyverse
.
Way 4: Counting NAs rowwise using apply
Sometimes it is useful to count the NAs rowwise (case by case). apply
allows for applying a function to each row of a dataframe (that the MARGIN
parameter).
apply(mtcars, MARGIN = 1, function(x) sum(is.na(x)))
#> Mazda RX4 Mazda RX4 Wag Datsun 710
#> 0 0 0
#> Hornet 4 Drive Hornet Sportabout Valiant
#> 0 0 0
#> Duster 360 Merc 240D Merc 230
#> 0 0 0
#> Merc 280 Merc 280C Merc 450SE
#> 0 0 0
#> Merc 450SL Merc 450SLC Cadillac Fleetwood
#> 0 0 0
#> Lincoln Continental Chrysler Imperial Fiat 128
#> 0 0 0
#> Honda Civic Toyota Corolla Toyota Corona
#> 0 0 0
#> Dodge Challenger AMC Javelin Camaro Z28
#> 0 0 0
#> Pontiac Firebird Fiat X1-9 Porsche 914-2
#> 0 0 0
#> Lotus Europa Ford Pantera L Ferrari Dino
#> 0 0 0
#> Maserati Bora Volvo 142E
#> 0 0
Pros: Straightforward.
Cons: ?
Way 5: Counting NAs rowwise using dplyr
mtcars %>%
rowwise %>%
summarise(NA_per_row = sum(is.na(.)))
#> # A tibble: 32 x 1
#> NA_per_row
#> <int>
#> 1 0
#> 2 0
#> 3 0
#> 4 0
#> 5 0
#> 6 0
#> 7 0
#> 8 0
#> 9 0
#> 10 0
#> # ... with 22 more rows
Pro: Fits into the pipe thinking.
Cons: Somewhat less common.
Debrief
Counting NAs over all columns of a dataframe is quite a common task. Like always (?) in R, there a multiple ways to achieve it. Some where shown here (more exist). Enjoy!