4 min read

How to mutate all columns of a data frame

Say, you have a data frame with a number of columns, and you need to change every column in a similar way. A common example might be to standardize all (numeric) variables. How to do that in R? This post shows and explains an example using mutate_all() from the tidyverse.

Let’s stick to the question “how to z-standardize all columns” for the sake of simplicity (and neglect that there are precooked solutions, for example from the superb package sjmisc by strengejacke.)

library(tidyverse)
## Warning: package 'tibble' was built under R version 3.5.2
## Warning: package 'tidyr' was built under R version 3.5.2
## Warning: package 'purrr' was built under R version 3.5.2
## Warning: package 'dplyr' was built under R version 3.5.2
## Warning: package 'stringr' was built under R version 3.5.2
## Warning: package 'forcats' was built under R version 3.5.2
data(iris)

Easy but inefficient way

iris %>% 
  mutate(Sepal.Length_z = (Sepal.Length - mean(iris$Sepal.Length)) / sd(iris$Sepal.Length),
         Sepal.Width_z = (Sepal.Width - mean(iris$Sepal.Width)) / sd(iris$Sepal.Width))  %>% 
  head()
##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species Sepal.Length_z
## 1          5.1         3.5          1.4         0.2  setosa     -0.8976739
## 2          4.9         3.0          1.4         0.2  setosa     -1.1392005
## 3          4.7         3.2          1.3         0.2  setosa     -1.3807271
## 4          4.6         3.1          1.5         0.2  setosa     -1.5014904
## 5          5.0         3.6          1.4         0.2  setosa     -1.0184372
## 6          5.4         3.9          1.7         0.4  setosa     -0.5353840
##   Sepal.Width_z
## 1    1.01560199
## 2   -0.13153881
## 3    0.32731751
## 4    0.09788935
## 5    1.24503015
## 6    1.93331463

Beware the braces; it’s easy to get bitten (happened to me).

Clearly, this appraoch dow not scale well. In addition, you’ll strain your hand. And will enjoy the funniest typos.

Define helper function

A first useful step is to define a helper function which we will apply on every column:

z_std <- function(observed) {
  
  result <- (observed - mean(observed)) / sd(observed)
}

Of course, such a fucntion already exists a myriad times in other scripts, and yes, it is not crafted beautifully, but it will serve as a prgramatic start.

Now let’s apply it:

iris %>% 
  mutate(Sepal.Length_z = z_std(Sepal.Length)) %>% 
  head()
##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species Sepal.Length_z
## 1          5.1         3.5          1.4         0.2  setosa     -0.8976739
## 2          4.9         3.0          1.4         0.2  setosa     -1.1392005
## 3          4.7         3.2          1.3         0.2  setosa     -1.3807271
## 4          4.6         3.1          1.5         0.2  setosa     -1.5014904
## 5          5.0         3.6          1.4         0.2  setosa     -1.0184372
## 6          5.4         3.9          1.7         0.4  setosa     -0.5353840

Much cleaner, simpler, more relaxing.

Now to the conveyer belt

Now let’s apply it to each column:

iris %>% 
  select_if(is.numeric) %>% 
  mutate_all(funs(z = z_std(.))) %>% 
  head()
## Warning: funs() is soft deprecated as of dplyr 0.8.0
## please use list() instead
## 
## # Before:
## funs(name = f(.)
## 
## # After: 
## list(name = ~f(.))
## This warning is displayed once per session.
##   Sepal.Length Sepal.Width Petal.Length Petal.Width Sepal.Length_z
## 1          5.1         3.5          1.4         0.2     -0.8976739
## 2          4.9         3.0          1.4         0.2     -1.1392005
## 3          4.7         3.2          1.3         0.2     -1.3807271
## 4          4.6         3.1          1.5         0.2     -1.5014904
## 5          5.0         3.6          1.4         0.2     -1.0184372
## 6          5.4         3.9          1.7         0.4     -0.5353840
##   Sepal.Width_z Petal.Length_z Petal.Width_z
## 1    1.01560199      -1.335752     -1.311052
## 2   -0.13153881      -1.335752     -1.311052
## 3    0.32731751      -1.392399     -1.311052
## 4    0.09788935      -1.279104     -1.311052
## 5    1.24503015      -1.335752     -1.311052
## 6    1.93331463      -1.165809     -1.048667

Changes in dplyr

You might have noticed this warning:

Warning: funs() is soft deprecated as of dplyr 0.8.0

So let’s change the code above to reflect the change in dplyr.

iris %>% 
  select_if(is.numeric) %>% 
  mutate_all(list(z = ~ z_std(.))) %>% 
  head()
##   Sepal.Length Sepal.Width Petal.Length Petal.Width Sepal.Length_z
## 1          5.1         3.5          1.4         0.2     -0.8976739
## 2          4.9         3.0          1.4         0.2     -1.1392005
## 3          4.7         3.2          1.3         0.2     -1.3807271
## 4          4.6         3.1          1.5         0.2     -1.5014904
## 5          5.0         3.6          1.4         0.2     -1.0184372
## 6          5.4         3.9          1.7         0.4     -0.5353840
##   Sepal.Width_z Petal.Length_z Petal.Width_z
## 1    1.01560199      -1.335752     -1.311052
## 2   -0.13153881      -1.335752     -1.311052
## 3    0.32731751      -1.392399     -1.311052
## 4    0.09788935      -1.279104     -1.311052
## 5    1.24503015      -1.335752     -1.311052
## 6    1.93331463      -1.165809     -1.048667

This code can be written more compact, see below.

iris %>% 
  select_if(is.numeric) %>% 
  mutate_all( ~ z_std(.)) %>% 
  head()
##   Sepal.Length Sepal.Width Petal.Length Petal.Width
## 1   -0.8976739  1.01560199    -1.335752   -1.311052
## 2   -1.1392005 -0.13153881    -1.335752   -1.311052
## 3   -1.3807271  0.32731751    -1.392399   -1.311052
## 4   -1.5014904  0.09788935    -1.279104   -1.311052
## 5   -1.0184372  1.24503015    -1.335752   -1.311052
## 6   -0.5353840  1.93331463    -1.165809   -1.048667

Note that if you don’t supply a name (suffix) such as z in the example above, the function will silently overwrite the original variables.