1 Load packages
library(tidyverse) # data wrangling
library(digest)
library(rstanarm)
2 Motivation
Reproducibility of results is a major concern in science and industry alike. However, there are numerous pitfalls which may threaten reproducibility. This post explores one possible issue - the fixation of start values of random numbers drawn by R.
Reasons that seeds in R across different systems may (possibly) vary (I don’t know if they do, but I heard so) may include:
- different OSes
- different version of OSes
- different versions of R
3 User error
Maybe the most straight forward explanation is a user error.
Don’t forget to call set.seed()
right before the calculation.
Be sure to use a pre defined seed value.
4 Your help needed
Please let me know whether (or not) you get the same random numbers, once the seed is fixed. In order to check the possible influence source, please note your OS, and the R version (or paste the session info).
Thanks!
5 Same random numbers
What set.seed()
does is the fix the value of .Random.seed
,
which is in turn used to initiate the random number generator of R.
.Random.seed %>% head()
#> [1] 10403 624 -709689667 906139226 -133236717 1588108856
5.1 Without seed
Without seed, the random numbers drawn by R will be transient, ie., subject of change:
rnorm(1)
#> [1] -1.051114
replicate(n = 10, expr = rnorm(1))
#> [1] 0.01176695 -1.21472604 1.86946816 0.72725355 -2.53246585 -0.17330070
#> [7] -1.61645738 0.44063028 0.73718828 -1.89312377
5.2 With seed
set.seed(42)
rnorm(1)
#> [1] 1.370958
set.seed(42)
replicate(n = 10, expr = rnorm(1))
#> [1] 1.37095845 -0.56469817 0.36312841 0.63286260 0.40426832 -0.10612452
#> [7] 1.51152200 -0.09465904 2.01842371 -0.06271410
5.3 Using a hash
Let’s use a hash number to control the output more succinctly:
set.seed(42)
x <- replicate(n = 10, expr = rnorm(1))
digest(x)
#> [1] "8cbf5bafbba3d3c63d39c586f29b3a3d"
Let’s recompute and compare the hash values:
set.seed(42)
x2 <- replicate(n = 10, expr = rnorm(1))
digest(x2)
#> [1] "8cbf5bafbba3d3c63d39c586f29b3a3d"
identical(x, x2)
#> [1] TRUE
Identical, as expected.
6 Seeds in regression models
6.1 lm
set.seed(42)
lm1 <- lm(mpg ~ hp, data = mtcars)
lm1_coef <- coef(lm1)
lm1_coef
#> (Intercept) hp
#> 30.09886054 -0.06822828
Rerun to double check:
set.seed(42)
lm1 <- lm(mpg ~ hp, data = mtcars)
lm1_coef <- coef(lm1)
lm1_coef
#> (Intercept) hp
#> 30.09886054 -0.06822828
6.2 Stan mtcars
lm2 <- stan_glm(mpg ~ hp, data = mtcars, seed = 42, refresh = 0)
lm2_coef <- coef(lm2)
lm2_coef
#> (Intercept) hp
#> 30.11668130 -0.06820988
Rerun:
lm2 <- stan_glm(mpg ~ hp, data = mtcars, seed = 42, refresh = 0)
lm2_coef <- coef(lm2)
lm2_coef
#> (Intercept) hp
#> 30.11668130 -0.06820988
6.3 Stan penguins
Import data and run the model:
penguins <- read.csv("https://vincentarelbundock.github.io/Rdatasets/csv/palmerpenguins/penguins.csv")
library(rstanarm)
lm3 <- stan_glm(body_mass_g ~ bill_length_mm,
seed = 42,
refresh = 0,
data = penguins)
coef(lm3)
#> (Intercept) bill_length_mm
#> 356.57465 87.45281
By the way:
Copy-paste the chunk above (which is self-containing in terms of data and R packages) and then type reprex::reprex()
in the console (you need to have the package reprex
installed). You’ll get a reproducible example on your clipboard.
Rerun:
lm3 <- stan_glm(body_mass_g ~ bill_length_mm,
seed = 42,
refresh = 0,
data = penguins)
coef(lm3)
#> (Intercept) bill_length_mm
#> 356.57465 87.45281
7 Session info
#> ─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────
#> setting value
#> version R version 4.2.1 (2022-06-23)
#> os macOS Big Sur ... 10.16
#> system x86_64, darwin17.0
#> ui X11
#> language (EN)
#> collate en_US.UTF-8
#> ctype en_US.UTF-8
#> tz Europe/Berlin
#> date 2023-01-18
#> pandoc 2.19.2 @ /Applications/RStudio.app/Contents/Resources/app/quarto/bin/tools/ (via rmarkdown)
#>
#> ─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────
#> package * version date (UTC) lib source
#> assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.2.0)
#> backports 1.4.1 2021-12-13 [1] CRAN (R 4.2.0)
#> base64enc 0.1-3 2015-07-28 [1] CRAN (R 4.2.0)
#> bayesplot 1.10.0 2022-11-16 [1] CRAN (R 4.2.0)
#> blogdown 1.16 2022-12-13 [1] CRAN (R 4.2.0)
#> bookdown 0.31 2022-12-13 [1] CRAN (R 4.2.0)
#> boot 1.3-28.1 2022-11-22 [1] CRAN (R 4.2.0)
#> broom 1.0.2 2022-12-15 [1] CRAN (R 4.2.0)
#> bslib 0.4.2 2022-12-16 [1] CRAN (R 4.2.0)
#> cachem 1.0.6 2021-08-19 [1] CRAN (R 4.2.0)
#> callr 3.7.3 2022-11-02 [1] CRAN (R 4.2.0)
#> cellranger 1.1.0 2016-07-27 [1] CRAN (R 4.2.0)
#> cli 3.6.0 2023-01-09 [1] CRAN (R 4.2.0)
#> codetools 0.2-18 2020-11-04 [2] CRAN (R 4.2.1)
#> colorout * 1.2-2 2022-06-13 [1] local
#> colorspace 2.0-3 2022-02-21 [1] CRAN (R 4.2.0)
#> colourpicker 1.2.0 2022-10-28 [1] CRAN (R 4.2.0)
#> crayon 1.5.2 2022-09-29 [1] CRAN (R 4.2.1)
#> crosstalk 1.2.0 2021-11-04 [1] CRAN (R 4.2.0)
#> DBI 1.1.3 2022-06-18 [1] CRAN (R 4.2.0)
#> dbplyr 2.2.1 2022-06-27 [1] CRAN (R 4.2.0)
#> devtools 2.4.5 2022-10-11 [1] CRAN (R 4.2.1)
#> digest * 0.6.31 2022-12-11 [1] CRAN (R 4.2.0)
#> dplyr * 1.0.10 2022-09-01 [1] CRAN (R 4.2.0)
#> DT 0.26 2022-10-19 [1] CRAN (R 4.2.0)
#> dygraphs 1.1.1.6 2018-07-11 [1] CRAN (R 4.2.0)
#> ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.2.0)
#> evaluate 0.19 2022-12-13 [1] CRAN (R 4.2.0)
#> fansi 1.0.3 2022-03-24 [1] CRAN (R 4.2.0)
#> fastmap 1.1.0 2021-01-25 [1] CRAN (R 4.2.0)
#> forcats * 0.5.2 2022-08-19 [1] CRAN (R 4.2.0)
#> fs 1.5.2 2021-12-08 [1] CRAN (R 4.2.0)
#> gargle 1.2.1 2022-09-08 [1] CRAN (R 4.2.0)
#> generics 0.1.3 2022-07-05 [1] CRAN (R 4.2.0)
#> ggplot2 * 3.4.0 2022-11-04 [1] CRAN (R 4.2.0)
#> glue 1.6.2 2022-02-24 [1] CRAN (R 4.2.0)
#> googledrive 2.0.0 2021-07-08 [1] CRAN (R 4.2.0)
#> googlesheets4 1.0.1 2022-08-13 [1] CRAN (R 4.2.0)
#> gridExtra 2.3 2017-09-09 [1] CRAN (R 4.2.0)
#> gtable 0.3.1 2022-09-01 [1] CRAN (R 4.2.0)
#> gtools 3.9.4 2022-11-27 [1] CRAN (R 4.2.0)
#> haven 2.5.1 2022-08-22 [1] CRAN (R 4.2.0)
#> hms 1.1.2 2022-08-19 [1] CRAN (R 4.2.0)
#> htmltools 0.5.4 2022-12-07 [1] CRAN (R 4.2.0)
#> htmlwidgets 1.6.1 2023-01-07 [1] CRAN (R 4.2.0)
#> httpuv 1.6.8 2023-01-12 [1] CRAN (R 4.2.0)
#> httr 1.4.4 2022-08-17 [1] CRAN (R 4.2.0)
#> igraph 1.3.5 2022-09-22 [1] CRAN (R 4.2.0)
#> inline 0.3.19 2021-05-31 [1] CRAN (R 4.2.0)
#> jquerylib 0.1.4 2021-04-26 [1] CRAN (R 4.2.0)
#> jsonlite 1.8.4 2022-12-06 [1] CRAN (R 4.2.0)
#> knitr 1.41 2022-11-18 [1] CRAN (R 4.2.0)
#> later 1.3.0 2021-08-18 [1] CRAN (R 4.2.0)
#> lattice 0.20-45 2021-09-22 [2] CRAN (R 4.2.1)
#> lifecycle 1.0.3 2022-10-07 [1] CRAN (R 4.2.0)
#> lme4 1.1-31 2022-11-01 [1] CRAN (R 4.2.0)
#> loo 2.5.1 2022-03-24 [1] CRAN (R 4.2.0)
#> lubridate 1.9.0 2022-11-06 [1] CRAN (R 4.2.0)
#> magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.2.0)
#> markdown 1.4 2022-11-16 [1] CRAN (R 4.2.0)
#> MASS 7.3-58.1 2022-08-03 [1] CRAN (R 4.2.0)
#> Matrix 1.5-3 2022-11-11 [1] CRAN (R 4.2.0)
#> matrixStats 0.63.0 2022-11-18 [1] CRAN (R 4.2.0)
#> memoise 2.0.1 2021-11-26 [1] CRAN (R 4.2.0)
#> mime 0.12 2021-09-28 [1] CRAN (R 4.2.0)
#> miniUI 0.1.1.1 2018-05-18 [1] CRAN (R 4.2.0)
#> minqa 1.2.5 2022-10-19 [1] CRAN (R 4.2.1)
#> modelr 0.1.10 2022-11-11 [1] CRAN (R 4.2.0)
#> munsell 0.5.0 2018-06-12 [1] CRAN (R 4.2.0)
#> nlme 3.1-161 2022-12-15 [1] CRAN (R 4.2.0)
#> nloptr 2.0.3 2022-05-26 [1] CRAN (R 4.2.0)
#> pillar 1.8.1 2022-08-19 [1] CRAN (R 4.2.0)
#> pkgbuild 1.4.0 2022-11-27 [1] CRAN (R 4.2.0)
#> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.2.0)
#> pkgload 1.3.2 2022-11-16 [1] CRAN (R 4.2.0)
#> plyr 1.8.8 2022-11-11 [1] CRAN (R 4.2.0)
#> prettyunits 1.1.1 2020-01-24 [1] CRAN (R 4.2.0)
#> processx 3.8.0 2022-10-26 [1] CRAN (R 4.2.0)
#> profvis 0.3.7 2020-11-02 [1] CRAN (R 4.2.0)
#> promises 1.2.0.1 2021-02-11 [1] CRAN (R 4.2.0)
#> ps 1.7.2 2022-10-26 [1] CRAN (R 4.2.0)
#> purrr * 1.0.1 2023-01-10 [1] CRAN (R 4.2.0)
#> R6 2.5.1 2021-08-19 [1] CRAN (R 4.2.0)
#> Rcpp * 1.0.9 2022-07-08 [1] CRAN (R 4.2.0)
#> RcppParallel 5.1.6 2023-01-09 [1] CRAN (R 4.2.0)
#> readr * 2.1.3 2022-10-01 [1] CRAN (R 4.2.0)
#> readxl 1.4.1 2022-08-17 [1] CRAN (R 4.2.0)
#> remotes 2.4.2 2021-11-30 [1] CRAN (R 4.2.0)
#> reprex 2.0.2 2022-08-17 [1] CRAN (R 4.2.0)
#> reshape2 1.4.4 2020-04-09 [1] CRAN (R 4.2.0)
#> rlang 1.0.6 2022-09-24 [1] CRAN (R 4.2.0)
#> rmarkdown 2.19 2022-12-15 [1] CRAN (R 4.2.0)
#> rstan 2.21.7 2022-09-08 [1] CRAN (R 4.2.0)
#> rstanarm * 2.21.3 2022-04-09 [1] CRAN (R 4.2.0)
#> rstantools 2.2.0 2022-04-08 [1] CRAN (R 4.2.0)
#> rstudioapi 0.14 2022-08-22 [1] CRAN (R 4.2.0)
#> rvest 1.0.3 2022-08-19 [1] CRAN (R 4.2.0)
#> sass 0.4.4 2022-11-24 [1] CRAN (R 4.2.0)
#> scales 1.2.1 2022-08-20 [1] CRAN (R 4.2.0)
#> sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.2.0)
#> shiny 1.7.4 2022-12-15 [1] CRAN (R 4.2.0)
#> shinyjs 2.1.0 2021-12-23 [1] CRAN (R 4.2.0)
#> shinystan 2.6.0 2022-03-03 [1] CRAN (R 4.2.0)
#> shinythemes 1.2.0 2021-01-25 [1] CRAN (R 4.2.0)
#> StanHeaders 2.21.0-7 2020-12-17 [1] CRAN (R 4.2.0)
#> stringi 1.7.12 2023-01-11 [1] CRAN (R 4.2.0)
#> stringr * 1.5.0 2022-12-02 [1] CRAN (R 4.2.0)
#> survival 3.5-0 2023-01-09 [1] CRAN (R 4.2.0)
#> threejs 0.3.3 2020-01-21 [1] CRAN (R 4.2.0)
#> tibble * 3.1.8 2022-07-22 [1] CRAN (R 4.2.0)
#> tidyr * 1.2.1 2022-09-08 [1] CRAN (R 4.2.0)
#> tidyselect 1.2.0 2022-10-10 [1] CRAN (R 4.2.0)
#> tidyverse * 1.3.2 2022-07-18 [1] CRAN (R 4.2.0)
#> timechange 0.2.0 2023-01-11 [1] CRAN (R 4.2.0)
#> tzdb 0.3.0 2022-03-28 [1] CRAN (R 4.2.0)
#> urlchecker 1.0.1 2021-11-30 [1] CRAN (R 4.2.0)
#> usethis 2.1.6 2022-05-25 [1] CRAN (R 4.2.0)
#> utf8 1.2.2 2021-07-24 [1] CRAN (R 4.2.0)
#> vctrs 0.5.1 2022-11-16 [1] CRAN (R 4.2.0)
#> withr 2.5.0 2022-03-03 [1] CRAN (R 4.2.0)
#> xfun 0.36 2022-12-21 [1] CRAN (R 4.2.0)
#> xml2 1.3.3 2021-11-30 [1] CRAN (R 4.2.0)
#> xtable 1.8-4 2019-04-21 [1] CRAN (R 4.2.0)
#> xts 0.12.2 2022-10-16 [1] CRAN (R 4.2.0)
#> yaml 2.3.6 2022-10-18 [1] CRAN (R 4.2.0)
#> zoo 1.8-11 2022-09-17 [1] CRAN (R 4.2.0)
#>
#> [1] /Users/sebastiansaueruser/Rlibs
#> [2] /Library/Frameworks/R.framework/Versions/4.2/Resources/library
#>
#> ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────