Like for [others], Hadley Wickham’s dplyr
, and more generally, the tidyverse
approach has considerably changed the I do data analyses. Most notably, the pipe (coming from magrittr
by Stefan Milton Bache, see here) has creeped into nearly every analyses I, do.
That is, is every analyses except for functions, and other non-interactive stuff. In those programming contexts, the dplyr way does not work, due to its non standard evaluation or NSE for short.
So, this post is about programming with dplyr, or more precisely, using the recently introduced tidyeval approach (approached into widely used R libraries, that is).
To understand the usecase, consider the following example. Say, we count the frequencies of some groups, and want to add the proportiong of these counts.
library(tidyverse)
mtcars %>%
group_by(cyl) %>%
summarise(n = n())
## # A tibble: 3 x 2
## cyl n
## <dbl> <int>
## 1 4 11
## 2 6 7
## 3 8 14
Or, shorter mtcars %>% count(cyl)
.
Now, let’s add a column with the proporting of the count column (n
).
mtcars %>%
count(cyl) %>%
mutate(prop = n / sum(n))
## # A tibble: 3 x 3
## cyl n prop
## <dbl> <int> <dbl>
## 1 4 11 0.34375
## 2 6 7 0.21875
## 3 8 14 0.43750
For extending this game to more than one variable, see this post.
Now, let’s assume we would like to put the ‘add a proportion column to my dataframe’ in a function. How to proceed?
We might think this approach should work:
add_prop <- function(df, count_var, group_var){
df %>%
ungroup() %>%
group_by(group_var) %>%
mutate(prop = count_var / sum(n()))
}
mtcars %>%
count(cyl) %>%
add_prop(count_var = n, group_var = cyl)
However, this does not work.
The dplyr ‘verbs’ expect quoted input variables. That means we must quote the parameters first, before we can hand over them to dplyr. quo
basically means “hey R, don’t yet evaulate this expression. Just read it, understand the name of the expression, and wait until I tell you. And R, understand, that you got an expression, not a simple text string”. The last sentence tells us (or R) that we do not want a text string, but the quotation (or ‘citation’ of the expression if you like) of an expression to be evaluated in the futre.
mtcars %>%
count(cyl) %>%
add_prop(count_var = quo(n), group_var = quo(cyl))
The important point is quo
.
quo(mtcars$cyl)
## <quosure: global>
## ~mtcars$cyl
But there’s a second step we need to take. Now that quo
has defined a quoted expression, the dplyr verbs do not need to quote their input variables, because they are already quoted. That is, we need to tell R now: “Hey, do not quote, evaluate! We have taken care of the quoting before”.
add_prop <- function(df, count_var, group_var){
df %>%
mutate(prop = (!!count_var) / sum(!!count_var))
}
mtcars %>%
count(cyl) %>%
add_prop(count_var = quo(n), group_var = quo(cyl))
## # A tibble: 3 x 3
## cyl n prop
## <dbl> <int> <dbl>
## 1 4 11 0.34375
## 2 6 7 0.21875
## 3 8 14 0.43750
Hizzah!