Easy way to convert factors zu numbers

Converting factors to numbers in R can be frustrating. Consider the following sitation: We have some data, and try to convert a factor (sex in tips, see below) to a numeric variable:

library(tidyverse)
library(sjmisc)   # for recoding
data(tips, package = "reshape2")
glimpse(tips)
#> Observations: 244
#> Variables: 7
#> $ total_bill <dbl> 16.99, 10.34, 21.01, 23.68, 24.59, 25.29, 8.77, 26....
#> $ tip        <dbl> 1.01, 1.66, 3.50, 3.31, 3.61, 4.71, 2.00, 3.12, 1.9...
#> $ sex        <fct> Female, Male, Male, Male, Female, Male, Male, Male,...
#> $ smoker     <fct> No, No, No, No, No, No, No, No, No, No, No, No, No,...
#> $ day        <fct> Sun, Sun, Sun, Sun, Sun, Sun, Sun, Sun, Sun, Sun, S...
#> $ time       <fct> Dinner, Dinner, Dinner, Dinner, Dinner, Dinner, Din...
#> $ size       <int> 2, 3, 3, 2, 4, 4, 2, 4, 2, 2, 2, 4, 2, 4, 2, 2, 3, ...

Ok, here we go. Say, we want "Female" = 1 and "Male" = 0.

tips <- sjmisc::rec(tips, sex, rec = "Female = 1; Male = 0")
glimpse(tips)
#> Observations: 244
#> Variables: 8
#> $ total_bill <dbl> 16.99, 10.34, 21.01, 23.68, 24.59, 25.29, 8.77, 26....
#> $ tip        <dbl> 1.01, 1.66, 3.50, 3.31, 3.61, 4.71, 2.00, 3.12, 1.9...
#> $ sex        <fct> Female, Male, Male, Male, Female, Male, Male, Male,...
#> $ smoker     <fct> No, No, No, No, No, No, No, No, No, No, No, No, No,...
#> $ day        <fct> Sun, Sun, Sun, Sun, Sun, Sun, Sun, Sun, Sun, Sun, S...
#> $ time       <fct> Dinner, Dinner, Dinner, Dinner, Dinner, Dinner, Din...
#> $ size       <int> 2, 3, 3, 2, 4, 4, 2, 4, 2, 2, 2, 4, 2, 4, 2, 2, 3, ...
#> $ sex_r      <fct> 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, ...

Seems to have worked so far. Wait, sex_r ist still a factor, not numeric. So convert it using as.numeric:

tips$sex_num <- as.numeric(tips$sex_r)
glimpse(tips)
#> Observations: 244
#> Variables: 9
#> $ total_bill <dbl> 16.99, 10.34, 21.01, 23.68, 24.59, 25.29, 8.77, 26....
#> $ tip        <dbl> 1.01, 1.66, 3.50, 3.31, 3.61, 4.71, 2.00, 3.12, 1.9...
#> $ sex        <fct> Female, Male, Male, Male, Female, Male, Male, Male,...
#> $ smoker     <fct> No, No, No, No, No, No, No, No, No, No, No, No, No,...
#> $ day        <fct> Sun, Sun, Sun, Sun, Sun, Sun, Sun, Sun, Sun, Sun, S...
#> $ time       <fct> Dinner, Dinner, Dinner, Dinner, Dinner, Dinner, Din...
#> $ size       <int> 2, 3, 3, 2, 4, 4, 2, 4, 2, 2, 2, 4, 2, 4, 2, 2, 3, ...
#> $ sex_r      <fct> 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, ...
#> $ sex_num    <dbl> 2, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 2, 1, 1, 2, 1, 2, ...

Oh no! That’s not what we wanted! R has messed the thing up (?). The reason is that R sees the first factor level internally as the number 1 . The second level as number two. What’s the first factor level in our case? Let’s see:

factor(tips$sex) %>% head()
#> [1] Female Male   Male   Male   Female Male  
#> Levels: Female Male
factor(tips$sex_r) %>% head()
#> [1] 1 0 0 0 1 0
#> Levels: 0 1

That’s confusing: “0” is the first level of sex_r - internally for R represented by “1”. The second level of sex_r is “1” - internally represented by “2”. That’s why we get these numbers:

head(tips$sex_num)
#> [1] 2 1 1 1 2 1

Solution

One solution is to use readr::parse_number():

tips$sex_num <- parse_number(tips$sex_r)
head(tips$sex_num)
#> [1] 1 0 0 0 1 0
head(tips$sex_r)
#> [1] 1 0 0 0 1 0
#> Levels: 0 1

Worked!