More on programming with dplyr: converting quosures to strings
In this post, we have programmed a simple function using dplyr’s programming capabilities based on tidyeval; for more intro to programming with dplyr, see here.
In this post, we’ll go one step further and programm a function where a quosure will be turned to a string. Why this? Because quite a number of functions out there except strings as input parameters.
Libraries
library(tidyverse)
library(stringr)
Data example
Say, we have a string where we search for a word stem. However this stem does not appear in its “stem” form, but always with some suffixes. Let our stem be “spd” (the name of the German Social-Democratic party), and (for simplicity), we’ll assume two “instances” of “spd” that occurr with suffix, ie., “spdbt” and “spdde”. (I was just working on a text mining on Tweets of German politiicians, hence the example).
data <- c("spdde", "sdf", "sdf", "fdds", "spdde", "dsf", "spdbt", "df") %>% as_tibble
stem <- "spd"
Task 01: Extract the “spd” stem out of the data
Non-programmatically, ie., interactively, this is quite straight-forward. Some knowledge of Regex is helpful to render the task a bit more general:
data %>%
mutate(is_instance = str_detect(string = value, pattern = "spd\\w+"),
instance = str_match(value, "spd\\w+"))
## # A tibble: 8 x 3
## value is_instance instance
## <chr> <lgl> <chr>
## 1 spdde TRUE spdde
## 2 sdf FALSE <NA>
## 3 sdf FALSE <NA>
## 4 fdds FALSE <NA>
## 5 spdde TRUE spdde
## 6 dsf FALSE <NA>
## 7 spdbt TRUE spdbt
## 8 df FALSE <NA>
\\w
means “find a word-character” (ie., letter or digit), and +
means that at least 1 hit is expected. pattern
is the pattern to be looked for and string
defines the string where to search for the pattern.
Task 02: Make it a function
Obviously, a function is much more general. Say we have 10 parties we would like to warp through; it becomes tedious to repeat that code many times. We will want a function. How to do that with dplyr?
Let’s define a function with input parameters df
for the name of the dataframe, col
for the name of the column where the stemming is performend, and stem
, the term to be stemmed.
stemm_col <- function(df, col, stem){
col <- enquo(col)
col_name <- quo_name(enquo(col))
stem_name <- quo_name(enquo(stem))
df %>%
mutate(stemmed = str_extract(string = !!col,
pattern = paste0(stem_name,"\\w+"))) -> output
return(output)
}
See whether it runs:
stemm_col(df = data, col = value, stem = "spd")
## # A tibble: 8 x 2
## value stemmed
## <chr> <chr>
## 1 spdde spdde
## 2 sdf <NA>
## 3 sdf <NA>
## 4 fdds <NA>
## 5 spdde spdde
## 6 dsf <NA>
## 7 spdbt spdbt
## 8 df <NA>
Bit by bit explanation
-
col <- enquo(col)
– Take the parametercol
, and just remember, R, that it is an expression, in this case, the parameter of a function, denoting a column. -
col_name <- quo_name(enquo(col))
– Now turncol
(an expression) to a string, because the function further down (str_extract
) expects a string. -
str_extract(string = !!col,...
– Hey R, when you extract the strings, you need to know which string to search. OK, takecol
which is an expression, and now evaluate it (indicated by!!
), in this case, understand that it’s a column. Now go get the values.
Main idea
When working with NSE, it is important to distinguish between expression, strings and evaluation. Using NSE, we can look at the parameters of a function, and grap the parameters before they are evaluated. That’s why the parameters need not be well-behaved, importantly, they do not need to have quotation marks, which saves typing for the user.
From the perspective of the R-interpreter (the machine), the steps of action looks more or less like this:
-
Look at the parameters of the function
-
Save the parameters not as strings, not as variables, but as expressions (via
enquo
) -
Convert the expression to a string (via
quo_name
) or, eventually, -
Evaluate it, ie., unquote (let the function to its job) via
!!
(called bang-bang) or viaUQ
.