Exercises to data wrangling with the tidyverse

library(tidyverse)

1 Exercise collection: Life exptectancy

Get the data from this source.

gapminder_raw <- read_csv("https://raw.githubusercontent.com/swcarpentry/r-novice-gapminder/gh-pages/_episodes_rmd/data/gapminder-FiveYearData.csv")

2 Disclosure

This exercises are based on a tutorial by Rebekka Barter. Great work!

3 Research questions

How did life expectancy change in the course of the last decades? Did id change differently between the continents?

How does life expectancy differs today between the continents?

Is life expectancy related to GDP? If so, to what degree (and form)? Is this assocication moderated by continent?

4 First steps

First, open a script or Rmd file in RStudio. Next make sure you start (“load”) the necessary R packages, and import the data.

5 Getting help

  • Data wrangling cheatsheet
  • Data vizualization cheatsheet
  • It’s a low brainer, but it works: Just google for it. For example, if you are struggling how to reduce transparency of dots in a (ggplot2) plot, try “ggplot2 reduce transparency points” or similar queries.

6 Exercises

6.1 Data Wrangling

  1. Filter the data for the Americas in 2007, deselect all other variables.

  2. Create the variable gdp, defined as the product of population size and gdp per person.

  3. Identify the observation with lowest gdp per person.

  4. Identify all observations with above average life expectancy, stratified for each continent.

  5. Count the observations identified in the last step.

  6. Compute the mean life expectancy (the grand mean; ie., across all observations).

  7. Compute the mean life expectancy for each year.

6.2 Data Visualization

  1. Create a scatter plot showing the association of gdp per person and life expectancy. Put the putative cause on the X axis and the putative effect on the y axis.

  2. Add a rolling average line (also known as LOESS smoother).

  3. Add a linear model line.

  4. Create a scatter plot with year on the x axis, and life expectancy on the y axis. Each point should indicate the average life expectancy per year. Connect the dots with a line.

  5. Modify the last plot so that there is a line for each continent (ie., group by continent).

  6. Create a scatter plot showing the assocation of gdp per person and life expectancy. Put the putative cause on the X axis and the putative effect on the y axis. The color of the dots should map to the respective continent.

  7. Modify the last plot so that the size of the dots represents the population size. In addition, increase the transparency of the dots in order to mitigate overplotting.

  8. Modify the last plot so that there’s a facet (sub-plot) for each continent.

  9. Modify the last plot so that GDP is log transformed.

7 Solutions

You’ll find the solutions here.

8 Reproducibility

#> ─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────
#>  setting  value                       
#>  version  R version 4.0.2 (2020-06-22)
#>  os       macOS  10.16                
#>  system   x86_64, darwin17.0          
#>  ui       X11                         
#>  language (EN)                        
#>  collate  en_US.UTF-8                 
#>  ctype    en_US.UTF-8                 
#>  tz       Europe/Berlin               
#>  date     2021-02-24                  
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────
#>  package     * version    date       lib source                           
#>  assertthat    0.2.1      2019-03-21 [1] CRAN (R 4.0.0)                   
#>  blogdown      1.1        2021-01-19 [1] CRAN (R 4.0.2)                   
#>  bookdown      0.21.6     2021-02-02 [1] Github (rstudio/bookdown@6c7346a)
#>  bslib         0.2.4.9000 2021-02-02 [1] Github (rstudio/bslib@b3cd7a9)   
#>  cachem        1.0.4      2021-02-13 [1] CRAN (R 4.0.2)                   
#>  callr         3.5.1      2020-10-13 [1] CRAN (R 4.0.2)                   
#>  cli           2.3.1      2021-02-23 [1] CRAN (R 4.0.2)                   
#>  crayon        1.4.1      2021-02-08 [1] CRAN (R 4.0.2)                   
#>  desc          1.2.0      2018-05-01 [1] CRAN (R 4.0.0)                   
#>  devtools      2.3.2      2020-09-18 [1] CRAN (R 4.0.2)                   
#>  digest        0.6.27     2020-10-24 [1] CRAN (R 4.0.2)                   
#>  ellipsis      0.3.1      2020-05-15 [1] CRAN (R 4.0.0)                   
#>  evaluate      0.14       2019-05-28 [1] CRAN (R 4.0.0)                   
#>  fastmap       1.1.0      2021-01-25 [1] CRAN (R 4.0.2)                   
#>  fs            1.5.0      2020-07-31 [1] CRAN (R 4.0.2)                   
#>  glue          1.4.2      2020-08-27 [1] CRAN (R 4.0.2)                   
#>  htmltools     0.5.1.1    2021-01-22 [1] CRAN (R 4.0.2)                   
#>  jquerylib     0.1.3      2020-12-17 [1] CRAN (R 4.0.2)                   
#>  jsonlite      1.7.2      2020-12-09 [1] CRAN (R 4.0.2)                   
#>  knitr         1.31       2021-01-27 [1] CRAN (R 4.0.2)                   
#>  lifecycle     1.0.0      2021-02-15 [1] CRAN (R 4.0.2)                   
#>  magrittr      2.0.1      2020-11-17 [1] CRAN (R 4.0.2)                   
#>  memoise       2.0.0      2021-01-26 [1] CRAN (R 4.0.2)                   
#>  pkgbuild      1.2.0      2020-12-15 [1] CRAN (R 4.0.2)                   
#>  pkgload       1.2.0      2021-02-23 [1] CRAN (R 4.0.2)                   
#>  prettyunits   1.1.1      2020-01-24 [1] CRAN (R 4.0.0)                   
#>  processx      3.4.5      2020-11-30 [1] CRAN (R 4.0.2)                   
#>  ps            1.5.0      2020-12-05 [1] CRAN (R 4.0.2)                   
#>  purrr         0.3.4      2020-04-17 [1] CRAN (R 4.0.0)                   
#>  R6            2.5.0      2020-10-28 [1] CRAN (R 4.0.2)                   
#>  remotes       2.2.0      2020-07-21 [1] CRAN (R 4.0.2)                   
#>  rlang         0.4.10     2020-12-30 [1] CRAN (R 4.0.2)                   
#>  rmarkdown     2.7        2021-02-19 [1] CRAN (R 4.0.2)                   
#>  rprojroot     2.0.2      2020-11-15 [1] CRAN (R 4.0.2)                   
#>  sass          0.3.1      2021-01-24 [1] CRAN (R 4.0.2)                   
#>  sessioninfo   1.1.1      2018-11-05 [1] CRAN (R 4.0.0)                   
#>  stringi       1.5.3      2020-09-09 [1] CRAN (R 4.0.2)                   
#>  stringr       1.4.0      2019-02-10 [1] CRAN (R 4.0.0)                   
#>  testthat      3.0.2      2021-02-14 [1] CRAN (R 4.0.2)                   
#>  usethis       2.0.1      2021-02-10 [1] CRAN (R 4.0.2)                   
#>  withr         2.4.1      2021-01-26 [1] CRAN (R 4.0.2)                   
#>  xfun          0.21       2021-02-10 [1] CRAN (R 4.0.2)                   
#>  yaml          2.2.1      2020-02-01 [1] CRAN (R 4.0.0)                   
#> 
#> [1] /Users/sebastiansaueruser/Rlibs
#> [2] /Library/Frameworks/R.framework/Versions/4.0/Resources/library