# Easy way to convert factors zu numbers

Converting factors to numbers in R can be frustrating. Consider the following sitation: We have some data, and try to convert a factor (sex in tips, see below) to a numeric variable:

library(tidyverse)
library(sjmisc)   # for recoding
data(tips, package = "reshape2")
glimpse(tips)
#> Observations: 244
#> Variables: 7
#> $total_bill <dbl> 16.99, 10.34, 21.01, 23.68, 24.59, 25.29, 8.77, 26.... #>$ tip        <dbl> 1.01, 1.66, 3.50, 3.31, 3.61, 4.71, 2.00, 3.12, 1.9...
#> $sex <fct> Female, Male, Male, Male, Female, Male, Male, Male,... #>$ smoker     <fct> No, No, No, No, No, No, No, No, No, No, No, No, No,...
#> $day <fct> Sun, Sun, Sun, Sun, Sun, Sun, Sun, Sun, Sun, Sun, S... #>$ time       <fct> Dinner, Dinner, Dinner, Dinner, Dinner, Dinner, Din...
#> $size <int> 2, 3, 3, 2, 4, 4, 2, 4, 2, 2, 2, 4, 2, 4, 2, 2, 3, ... Ok, here we go. Say, we want "Female" = 1 and "Male" = 0. tips <- sjmisc::rec(tips, sex, rec = "Female = 1; Male = 0") glimpse(tips) #> Observations: 244 #> Variables: 8 #>$ total_bill <dbl> 16.99, 10.34, 21.01, 23.68, 24.59, 25.29, 8.77, 26....
#> $tip <dbl> 1.01, 1.66, 3.50, 3.31, 3.61, 4.71, 2.00, 3.12, 1.9... #>$ sex        <fct> Female, Male, Male, Male, Female, Male, Male, Male,...
#> $smoker <fct> No, No, No, No, No, No, No, No, No, No, No, No, No,... #>$ day        <fct> Sun, Sun, Sun, Sun, Sun, Sun, Sun, Sun, Sun, Sun, S...
#> $time <fct> Dinner, Dinner, Dinner, Dinner, Dinner, Dinner, Din... #>$ size       <int> 2, 3, 3, 2, 4, 4, 2, 4, 2, 2, 2, 4, 2, 4, 2, 2, 3, ...
#> $sex_r <fct> 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, ... Seems to have worked so far. Wait, sex_r ist still a factor, not numeric. So convert it using as.numeric: tips$sex_num <- as.numeric(tips$sex_r) glimpse(tips) #> Observations: 244 #> Variables: 9 #>$ total_bill <dbl> 16.99, 10.34, 21.01, 23.68, 24.59, 25.29, 8.77, 26....
#> $tip <dbl> 1.01, 1.66, 3.50, 3.31, 3.61, 4.71, 2.00, 3.12, 1.9... #>$ sex        <fct> Female, Male, Male, Male, Female, Male, Male, Male,...
#> $smoker <fct> No, No, No, No, No, No, No, No, No, No, No, No, No,... #>$ day        <fct> Sun, Sun, Sun, Sun, Sun, Sun, Sun, Sun, Sun, Sun, S...
#> $time <fct> Dinner, Dinner, Dinner, Dinner, Dinner, Dinner, Din... #>$ size       <int> 2, 3, 3, 2, 4, 4, 2, 4, 2, 2, 2, 4, 2, 4, 2, 2, 3, ...
#> $sex_r <fct> 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, ... #>$ sex_num    <dbl> 2, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 2, 1, 1, 2, 1, 2, ...

Oh no! That’s not what we wanted! R has messed the thing up (?). The reason is that R sees the first factor level internally as the number 1 . The second level as number two. What’s the first factor level in our case? Let’s see:

factor(tips$sex) %>% head() #> [1] Female Male Male Male Female Male #> Levels: Female Male factor(tips$sex_r) %>% head()
#> [1] 1 0 0 0 1 0
#> Levels: 0 1

That’s confusing: “0” is the first level of sex_r - internally for R represented by “1”. The second level of sex_r is “1” - internally represented by “2”. That’s why we get these numbers:

head(tips$sex_num) #> [1] 2 1 1 1 2 1 ## Solution One solution is to use readr::parse_number(): tips$sex_num <- parse_number(tips$sex_r) head(tips$sex_num)
#> [1] 1 0 0 0 1 0