tidyeval, some musings on dplyr::filter

Programming with the tidyverse

… Is not exactly self-evident. It actually requires some head wrapping, at least in my experience. In this post, we are exploring some aspects on programming when filtering rows. Let’s see.

Setup

library(tidyverse)

Some filtering chunk

Let’s say we would like to filter observations according to some variable and a given threshold in some data set:

mtcars %>% 
  filter(hp > 200)
##                      mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Duster 360          14.3   8  360 245 3.21 3.570 15.84  0  0    3    4
## Cadillac Fleetwood  10.4   8  472 205 2.93 5.250 17.98  0  0    3    4
## Lincoln Continental 10.4   8  460 215 3.00 5.424 17.82  0  0    3    4
## Chrysler Imperial   14.7   8  440 230 3.23 5.345 17.42  0  0    3    4
## Camaro Z28          13.3   8  350 245 3.73 3.840 15.41  0  0    3    4
## Ford Pantera L      15.8   8  351 264 4.22 3.170 14.50  0  1    5    4
## Maserati Bora       15.0   8  301 335 3.54 3.570 14.60  0  1    5    8

Of course, that’s a toy problem, but for the sake of the tidyverse programming thing, it may be a helpful start to keep things straight.

Now, say, instead of 200, we abstract it to threshold, a variable giving the filter value:

threshold <- 200

mtcars %>% 
  filter(hp > threshold)
##                      mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Duster 360          14.3   8  360 245 3.21 3.570 15.84  0  0    3    4
## Cadillac Fleetwood  10.4   8  472 205 2.93 5.250 17.98  0  0    3    4
## Lincoln Continental 10.4   8  460 215 3.00 5.424 17.82  0  0    3    4
## Chrysler Imperial   14.7   8  440 230 3.23 5.345 17.42  0  0    3    4
## Camaro Z28          13.3   8  350 245 3.73 3.840 15.41  0  0    3    4
## Ford Pantera L      15.8   8  351 264 4.22 3.170 14.50  0  1    5    4
## Maserati Bora       15.0   8  301 335 3.54 3.570 14.60  0  1    5    8

Works! So filter is able to digest a variable, at least at the right hand side of the filter expression.

Now let’s abstract it to a function

filter_fun <- function(data, threshold){
  
  data %>% 
    filter(hp > threshold)
}

Let’s try it:

filter_fun(mtcars, 200)
##                      mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Duster 360          14.3   8  360 245 3.21 3.570 15.84  0  0    3    4
## Cadillac Fleetwood  10.4   8  472 205 2.93 5.250 17.98  0  0    3    4
## Lincoln Continental 10.4   8  460 215 3.00 5.424 17.82  0  0    3    4
## Chrysler Imperial   14.7   8  440 230 3.23 5.345 17.42  0  0    3    4
## Camaro Z28          13.3   8  350 245 3.73 3.840 15.41  0  0    3    4
## Ford Pantera L      15.8   8  351 264 4.22 3.170 14.50  0  1    5    4
## Maserati Bora       15.0   8  301 335 3.54 3.570 14.60  0  1    5    8

Works!

Actract variable to be filtered

filter_fun <- function(data, col, threshold){
  
  data %>% 
    filter({{col}} > threshold)
}

Let’s try it:

filter_fun(mtcars, hp, 200)
##                      mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Duster 360          14.3   8  360 245 3.21 3.570 15.84  0  0    3    4
## Cadillac Fleetwood  10.4   8  472 205 2.93 5.250 17.98  0  0    3    4
## Lincoln Continental 10.4   8  460 215 3.00 5.424 17.82  0  0    3    4
## Chrysler Imperial   14.7   8  440 230 3.23 5.345 17.42  0  0    3    4
## Camaro Z28          13.3   8  350 245 3.73 3.840 15.41  0  0    3    4
## Ford Pantera L      15.8   8  351 264 4.22 3.170 14.50  0  1    5    4
## Maserati Bora       15.0   8  301 335 3.54 3.570 14.60  0  1    5    8

Works!

We can make use of the “curly curly” operator as in {{col}} to explain to the tidyverse function that what now comes is a variable, and not a column name in the present data frame.