Programming with the tidyverse
… Is not exactly self-evident. It actually requires some head wrapping, at least in my experience. In this post, we are exploring some aspects on programming when filtering rows. Let’s see.
Setup
library(tidyverse)
Some filtering chunk
Let’s say we would like to filter observations according to some variable and a given threshold in some data set:
mtcars %>%
filter(hp > 200)
## mpg cyl disp hp drat wt qsec vs am gear carb
## Duster 360 14.3 8 360 245 3.21 3.570 15.84 0 0 3 4
## Cadillac Fleetwood 10.4 8 472 205 2.93 5.250 17.98 0 0 3 4
## Lincoln Continental 10.4 8 460 215 3.00 5.424 17.82 0 0 3 4
## Chrysler Imperial 14.7 8 440 230 3.23 5.345 17.42 0 0 3 4
## Camaro Z28 13.3 8 350 245 3.73 3.840 15.41 0 0 3 4
## Ford Pantera L 15.8 8 351 264 4.22 3.170 14.50 0 1 5 4
## Maserati Bora 15.0 8 301 335 3.54 3.570 14.60 0 1 5 8
Of course, that’s a toy problem, but for the sake of the tidyverse programming thing, it may be a helpful start to keep things straight.
Now, say, instead of 200
, we abstract it to threshold
, a variable giving the filter value:
threshold <- 200
mtcars %>%
filter(hp > threshold)
## mpg cyl disp hp drat wt qsec vs am gear carb
## Duster 360 14.3 8 360 245 3.21 3.570 15.84 0 0 3 4
## Cadillac Fleetwood 10.4 8 472 205 2.93 5.250 17.98 0 0 3 4
## Lincoln Continental 10.4 8 460 215 3.00 5.424 17.82 0 0 3 4
## Chrysler Imperial 14.7 8 440 230 3.23 5.345 17.42 0 0 3 4
## Camaro Z28 13.3 8 350 245 3.73 3.840 15.41 0 0 3 4
## Ford Pantera L 15.8 8 351 264 4.22 3.170 14.50 0 1 5 4
## Maserati Bora 15.0 8 301 335 3.54 3.570 14.60 0 1 5 8
Works! So filter
is able to digest a variable, at least at the right hand side of the filter expression.
Now let’s abstract it to a function
filter_fun <- function(data, threshold){
data %>%
filter(hp > threshold)
}
Let’s try it:
filter_fun(mtcars, 200)
## mpg cyl disp hp drat wt qsec vs am gear carb
## Duster 360 14.3 8 360 245 3.21 3.570 15.84 0 0 3 4
## Cadillac Fleetwood 10.4 8 472 205 2.93 5.250 17.98 0 0 3 4
## Lincoln Continental 10.4 8 460 215 3.00 5.424 17.82 0 0 3 4
## Chrysler Imperial 14.7 8 440 230 3.23 5.345 17.42 0 0 3 4
## Camaro Z28 13.3 8 350 245 3.73 3.840 15.41 0 0 3 4
## Ford Pantera L 15.8 8 351 264 4.22 3.170 14.50 0 1 5 4
## Maserati Bora 15.0 8 301 335 3.54 3.570 14.60 0 1 5 8
Works!
Actract variable to be filtered
filter_fun <- function(data, col, threshold){
data %>%
filter({{col}} > threshold)
}
Let’s try it:
filter_fun(mtcars, hp, 200)
## mpg cyl disp hp drat wt qsec vs am gear carb
## Duster 360 14.3 8 360 245 3.21 3.570 15.84 0 0 3 4
## Cadillac Fleetwood 10.4 8 472 205 2.93 5.250 17.98 0 0 3 4
## Lincoln Continental 10.4 8 460 215 3.00 5.424 17.82 0 0 3 4
## Chrysler Imperial 14.7 8 440 230 3.23 5.345 17.42 0 0 3 4
## Camaro Z28 13.3 8 350 245 3.73 3.840 15.41 0 0 3 4
## Ford Pantera L 15.8 8 351 264 4.22 3.170 14.50 0 1 5 4
## Maserati Bora 15.0 8 301 335 3.54 3.570 14.60 0 1 5 8
Works!
We can make use of the “curly curly” operator as in {{col}}
to explain to the tidyverse function
that what now comes is a variable, and not a column name in the present data frame.