Finding NAs in multiples columns (per row)

Assume you would like to check for missing data, but not for one column only but for several columns.

First, data and some packages:

data(mtcars)
library(tidyverse)

Then, let’s introduce some missing data:

mtcars[c(1,2), 1] <- NA
mtcars[c(1, 3:4), 2] <- NA

Don’t check columns individually

Of course, you do not want to repeat yourself, and check each column individually, like this:

sum(is.na(mtcars[[1]]))
#> [1] 2
sum(is.na(mtcars[, 1]))  # same
#> [1] 2

Neither one would like to check each row individually:

sum(is.na(mtcars[1, ]))
#> [1] 2

Apply a function to each column

We need to apply() the function above to each column (or row). map works similar to apply but has some niceties included (map comes from R package purrr).

mtcars %>% 
  map(~sum(is.na(.)))
#> $mpg
#> [1] 2
#> 
#> $cyl
#> [1] 3
#> 
#> $disp
#> [1] 0
#> 
#> $hp
#> [1] 0
#> 
#> $drat
#> [1] 0
#> 
#> $wt
#> [1] 0
#> 
#> $qsec
#> [1] 0
#> 
#> $vs
#> [1] 0
#> 
#> $am
#> [1] 0
#> 
#> $gear
#> [1] 0
#> 
#> $carb
#> [1] 0

Note that ~ is shorthand for function(x), but less verbose. . refers to each element of mtcars, ie each column.

Count NAs per row

Now assume you want to know the missing per case (ie row). One way is this:

mtcars %>% 
mutate(NA_count = rowSums(is.na(.))) %>% 
  head()
#>    mpg cyl disp  hp drat    wt  qsec vs am gear carb NA_count
#> 1   NA  NA  160 110 3.90 2.620 16.46  0  1    4    4        2
#> 2   NA   6  160 110 3.90 2.875 17.02  0  1    4    4        1
#> 3 22.8  NA  108  93 3.85 2.320 18.61  1  1    4    1        1
#> 4 21.4  NA  258 110 3.08 3.215 19.44  1  0    3    1        1
#> 5 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2        0
#> 6 18.1   6  225 105 2.76 3.460 20.22  1  0    3    1        0

Here, the dot . refers to the data frame as of the last pipe step. In this case, it’s just the plain data frame mtcars.

Of course, the pipe is not needed:

rowSums(is.na(mtcars))
#>           Mazda RX4       Mazda RX4 Wag          Datsun 710 
#>                   2                   1                   1 
#>      Hornet 4 Drive   Hornet Sportabout             Valiant 
#>                   1                   0                   0 
#>          Duster 360           Merc 240D            Merc 230 
#>                   0                   0                   0 
#>            Merc 280           Merc 280C          Merc 450SE 
#>                   0                   0                   0 
#>          Merc 450SL         Merc 450SLC  Cadillac Fleetwood 
#>                   0                   0                   0 
#> Lincoln Continental   Chrysler Imperial            Fiat 128 
#>                   0                   0                   0 
#>         Honda Civic      Toyota Corolla       Toyota Corona 
#>                   0                   0                   0 
#>    Dodge Challenger         AMC Javelin          Camaro Z28 
#>                   0                   0                   0 
#>    Pontiac Firebird           Fiat X1-9       Porsche 914-2 
#>                   0                   0                   0 
#>        Lotus Europa      Ford Pantera L        Ferrari Dino 
#>                   0                   0                   0 
#>       Maserati Bora          Volvo 142E 
#>                   0                   0

A more classical R-way would consist of the following:

  1. Apply a function over each row of dataframe
  2. This function would be sum(is.na(x)) in this case, where x refers to each row
apply(mtcars,MARGIN = 1, FUN = function(x) sum(is.na(x)))
#>           Mazda RX4       Mazda RX4 Wag          Datsun 710 
#>                   2                   1                   1 
#>      Hornet 4 Drive   Hornet Sportabout             Valiant 
#>                   1                   0                   0 
#>          Duster 360           Merc 240D            Merc 230 
#>                   0                   0                   0 
#>            Merc 280           Merc 280C          Merc 450SE 
#>                   0                   0                   0 
#>          Merc 450SL         Merc 450SLC  Cadillac Fleetwood 
#>                   0                   0                   0 
#> Lincoln Continental   Chrysler Imperial            Fiat 128 
#>                   0                   0                   0 
#>         Honda Civic      Toyota Corolla       Toyota Corona 
#>                   0                   0                   0 
#>    Dodge Challenger         AMC Javelin          Camaro Z28 
#>                   0                   0                   0 
#>    Pontiac Firebird           Fiat X1-9       Porsche 914-2 
#>                   0                   0                   0 
#>        Lotus Europa      Ford Pantera L        Ferrari Dino 
#>                   0                   0                   0 
#>       Maserati Bora          Volvo 142E 
#>                   0                   0

Count NAs of whole dataframe

Note that is.na() expects a data.frame as input.

is.na(mtcars) %>% 
  head()
#>                     mpg   cyl  disp    hp  drat    wt  qsec    vs    am
#> Mazda RX4          TRUE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
#> Mazda RX4 Wag      TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
#> Datsun 710        FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
#> Hornet 4 Drive    FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
#> Hornet Sportabout FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
#> Valiant           FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
#>                    gear  carb
#> Mazda RX4         FALSE FALSE
#> Mazda RX4 Wag     FALSE FALSE
#> Datsun 710        FALSE FALSE
#> Hornet 4 Drive    FALSE FALSE
#> Hornet Sportabout FALSE FALSE
#> Valiant           FALSE FALSE

Note that sum() also accepts a data.frame as input:

is.na(mtcars) %>% 
  sum() 
#> [1] 5

Some musings

To inspect missing values (or cases with NAs), filter() may be a solution:

mtcars %>% 
  filter(is.na(mpg) |  is.na(cyl) | is.na(disp))
#>    mpg cyl disp  hp drat    wt  qsec vs am gear carb
#> 1   NA  NA  160 110 3.90 2.620 16.46  0  1    4    4
#> 2   NA   6  160 110 3.90 2.875 17.02  0  1    4    4
#> 3 22.8  NA  108  93 3.85 2.320 18.61  1  1    4    1
#> 4 21.4  NA  258 110 3.08 3.215 19.44  1  0    3    1

More conveniently using complete.cases():

complete.cases(mtcars)
#>  [1] FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
#> [12]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
#> [23]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE

And the complement:

mtcars %>% 
  filter(!complete.cases(mtcars))
#>    mpg cyl disp  hp drat    wt  qsec vs am gear carb
#> 1   NA  NA  160 110 3.90 2.620 16.46  0  1    4    4
#> 2   NA   6  160 110 3.90 2.875 17.02  0  1    4    4
#> 3 22.8  NA  108  93 3.85 2.320 18.61  1  1    4    1
#> 4 21.4  NA  258 110 3.08 3.215 19.44  1  0    3    1