Preparing German weather data

1 Load packages

library(tidyverse)  # data wrangling
library(glue)

2 Motivation

In this post, we’ll prepare official German weather data. All data are accessed from DWD.

More specifically, the DWD (Deutscher Wetterdienst, German Weather Service) provides an open data section https://opendata.dwd.de/, from where the data were fetched.

3 Licence

The licence can be read here, in short, the data are provided free of charge for everybody’s use.

At this subpage however, it is stated that the materials are copy right protected and that source must be declared like this:

Datenbasis: Deutscher Wetterdienst, eigene Elemente ergänzt.

(Data Source: Deutscher Wetterdiesnt, own elements added).

And, furthermore:

When speaking of spatial data, this also includes any location-related weather and climate information presented on our open web pages.

4 It’s a playful approach

I’m not pretending to do full-fetched meterology research. What’s being done is a rather playful appraoch to derive some basic insights noentheless.

5 Download data

5.1 Air temperature means

Here’s the URL stem, indicating a folder where the data can be found:

url_stem <- "https://opendata.dwd.de/climate_environment/CDC/regional_averages_DE/monthly/air_temperature_mean/"

There’s one file for each month, providing the mean air temperature for the German Bundesländer:

months <- str_pad(1:12, width = 2, side = "left", pad = "0")
months
#>  [1] "01" "02" "03" "04" "05" "06" "07" "08" "09" "10" "11" "12"
month_temp_data <- glue("regional_averages_tm_{months}.txt")
month_temp_data
#> regional_averages_tm_01.txt
#> regional_averages_tm_02.txt
#> regional_averages_tm_03.txt
#> regional_averages_tm_04.txt
#> regional_averages_tm_05.txt
#> regional_averages_tm_06.txt
#> regional_averages_tm_07.txt
#> regional_averages_tm_08.txt
#> regional_averages_tm_09.txt
#> regional_averages_tm_10.txt
#> regional_averages_tm_11.txt
#> regional_averages_tm_12.txt

Let’s glue the whole path:

air_temp_path <- paste0(url_stem,month_temp_data)
air_temp_path
#>  [1] "https://opendata.dwd.de/climate_environment/CDC/regional_averages_DE/monthly/air_temperature_mean/regional_averages_tm_01.txt"
#>  [2] "https://opendata.dwd.de/climate_environment/CDC/regional_averages_DE/monthly/air_temperature_mean/regional_averages_tm_02.txt"
#>  [3] "https://opendata.dwd.de/climate_environment/CDC/regional_averages_DE/monthly/air_temperature_mean/regional_averages_tm_03.txt"
#>  [4] "https://opendata.dwd.de/climate_environment/CDC/regional_averages_DE/monthly/air_temperature_mean/regional_averages_tm_04.txt"
#>  [5] "https://opendata.dwd.de/climate_environment/CDC/regional_averages_DE/monthly/air_temperature_mean/regional_averages_tm_05.txt"
#>  [6] "https://opendata.dwd.de/climate_environment/CDC/regional_averages_DE/monthly/air_temperature_mean/regional_averages_tm_06.txt"
#>  [7] "https://opendata.dwd.de/climate_environment/CDC/regional_averages_DE/monthly/air_temperature_mean/regional_averages_tm_07.txt"
#>  [8] "https://opendata.dwd.de/climate_environment/CDC/regional_averages_DE/monthly/air_temperature_mean/regional_averages_tm_08.txt"
#>  [9] "https://opendata.dwd.de/climate_environment/CDC/regional_averages_DE/monthly/air_temperature_mean/regional_averages_tm_09.txt"
#> [10] "https://opendata.dwd.de/climate_environment/CDC/regional_averages_DE/monthly/air_temperature_mean/regional_averages_tm_10.txt"
#> [11] "https://opendata.dwd.de/climate_environment/CDC/regional_averages_DE/monthly/air_temperature_mean/regional_averages_tm_11.txt"
#> [12] "https://opendata.dwd.de/climate_environment/CDC/regional_averages_DE/monthly/air_temperature_mean/regional_averages_tm_12.txt"

6 Download multiple files and bind them together rowwise

First, check:

Here’s the first path:

air_temp_path[1]
#> [1] "https://opendata.dwd.de/climate_environment/CDC/regional_averages_DE/monthly/air_temperature_mean/regional_averages_tm_01.txt"
d <- read_delim(air_temp_path[1], delim = ";", skip = 1)
head(d)
#> # A tibble: 6 × 20
#>    Jahr Monat Brandenbur…¹ Brand…² Baden…³ Bayern Hessen Meckl…⁴ Niede…⁵ Niede…⁶
#>   <dbl> <chr> <chr>        <chr>   <chr>   <chr>  <chr>  <chr>   <chr>   <chr>  
#> 1  1881 01    "    -5.54"  "    -… "    -… "    … "    … "    -… "    -… "    -…
#> 2  1882 01    "     1.43"  "     … "    -… "    … "    … "     … "     … "     …
#> 3  1883 01    "    -0.41"  "    -… "    -… "    … "    … "    -… "     … "     …
#> 4  1884 01    "     3.35"  "     … "     … "    … "    … "     … "     … "     …
#> 5  1885 01    "    -2.15"  "    -… "    -… "    … "    … "    -… "    -… "    -…
#> 6  1886 01    "    -0.96"  "    -… "    -… "    … "    … "    -… "    -… "    -…
#> # … with 10 more variables: `Nordrhein-Westfalen` <chr>,
#> #   `Rheinland-Pfalz` <chr>, `Schleswig-Holstein` <chr>, Saarland <chr>,
#> #   Sachsen <chr>, `Sachsen-Anhalt` <chr>, `Thueringen/Sachsen-Anhalt` <chr>,
#> #   Thueringen <chr>, Deutschland <chr>, ...20 <lgl>, and abbreviated variable
#> #   names ¹​`Brandenburg/Berlin`, ²​Brandenburg, ³​`Baden-Wuerttemberg`,
#> #   ⁴​`Mecklenburg-Vorpommern`, ⁵​Niedersachsen, ⁶​`Niedersachsen/Hamburg/Bremen`
#> # ℹ Use `colnames()` to see all variable names

However, for this file, some columns are of a different type:

d <- read_delim(air_temp_path[11], delim = ";", skip = 1)
head(d)
#> # A tibble: 6 × 20
#>    Jahr Monat Brandenbur…¹ Brand…² Baden…³ Bayern Hessen Meckl…⁴ Niede…⁵ Niede…⁶
#>   <dbl> <dbl> <chr>        <chr>   <chr>   <chr>  <chr>  <chr>   <chr>   <chr>  
#> 1  1881    11 "     6.01"  "     … "     … "    … "    … "     … "     … "     …
#> 2  1882    11 "     4.01"  "     … "     … "    … "    … "     … "     … "     …
#> 3  1883    11 "     4.92"  "     … "     … "    … "    … "     … "     … "     …
#> 4  1884    11 "     1.82"  "     … "     … "    … "    … "     … "     … "     …
#> 5  1885    11 "     2.47"  "     … "     … "    … "    … "     … "     … "     …
#> 6  1886    11 "     5.59"  "     … "     … "    … "    … "     … "     … "     …
#> # … with 10 more variables: `Nordrhein-Westfalen` <chr>,
#> #   `Rheinland-Pfalz` <chr>, `Schleswig-Holstein` <chr>, Saarland <chr>,
#> #   Sachsen <chr>, `Sachsen-Anhalt` <chr>, `Thueringen/Sachsen-Anhalt` <chr>,
#> #   Thueringen <chr>, Deutschland <chr>, ...20 <lgl>, and abbreviated variable
#> #   names ¹​`Brandenburg/Berlin`, ²​Brandenburg, ³​`Baden-Wuerttemberg`,
#> #   ⁴​`Mecklenburg-Vorpommern`, ⁵​Niedersachsen, ⁶​`Niedersachsen/Hamburg/Bremen`
#> # ℹ Use `colnames()` to see all variable names

See col Monat.

So let’s better read all cols as character, and post-process the numnbers out of the strings.

d <- read_delim(air_temp_path[11], delim = ";", skip = 1, col_types = cols(.default = "c")) %>% select(-`...20`)

head(d)
#> # A tibble: 6 × 19
#>   Jahr  Monat Brandenbur…¹ Brand…² Baden…³ Bayern Hessen Meckl…⁴ Niede…⁵ Niede…⁶
#>   <chr> <chr> <chr>        <chr>   <chr>   <chr>  <chr>  <chr>   <chr>   <chr>  
#> 1 1881  11    "     6.01"  "     … "     … "    … "    … "     … "     … "     …
#> 2 1882  11    "     4.01"  "     … "     … "    … "    … "     … "     … "     …
#> 3 1883  11    "     4.92"  "     … "     … "    … "    … "     … "     … "     …
#> 4 1884  11    "     1.82"  "     … "     … "    … "    … "     … "     … "     …
#> 5 1885  11    "     2.47"  "     … "     … "    … "    … "     … "     … "     …
#> 6 1886  11    "     5.59"  "     … "     … "    … "    … "     … "     … "     …
#> # … with 9 more variables: `Nordrhein-Westfalen` <chr>,
#> #   `Rheinland-Pfalz` <chr>, `Schleswig-Holstein` <chr>, Saarland <chr>,
#> #   Sachsen <chr>, `Sachsen-Anhalt` <chr>, `Thueringen/Sachsen-Anhalt` <chr>,
#> #   Thueringen <chr>, Deutschland <chr>, and abbreviated variable names
#> #   ¹​`Brandenburg/Berlin`, ²​Brandenburg, ³​`Baden-Wuerttemberg`,
#> #   ⁴​`Mecklenburg-Vorpommern`, ⁵​Niedersachsen, ⁶​`Niedersachsen/Hamburg/Bremen`
#> # ℹ Use `colnames()` to see all variable names

First, download all files and append to one tibble:

d_air_temp <- 
  air_temp_path %>% 
  map_df(~ read_delim(., skip = 1, col_types = cols(.default = "c"))) %>% 
  select(-`...20`)

Second, let’s parse the numbers out of the strings.

d_air_temp2 <-
  d_air_temp %>% 
  mutate(across(.cols = everything(),
                 .fns = parse_number))

d_air_temp2 %>% 
  head() %>% 
  gt::gt()
Jahr Monat Brandenburg/Berlin Brandenburg Baden-Wuerttemberg Bayern Hessen Mecklenburg-Vorpommern Niedersachsen Niedersachsen/Hamburg/Bremen Nordrhein-Westfalen Rheinland-Pfalz Schleswig-Holstein Saarland Sachsen Sachsen-Anhalt Thueringen/Sachsen-Anhalt Thueringen Deutschland
1881 1 -5.54 -5.56 -4.89 -6.51 -5.68 -5.07 -4.55 -4.55 -4.21 -4.49 -4.06 -4.15 -6.22 -5.89 -6.28 -6.76 -5.36
1882 1 1.43 1.42 -0.65 -1.33 -0.30 1.59 1.73 1.73 1.26 -0.17 2.11 -0.04 0.63 1.18 0.60 -0.12 0.41
1883 1 -0.41 -0.42 -0.09 -1.49 0.04 -0.69 0.33 0.32 1.27 0.49 -0.12 0.55 -1.20 -0.39 -0.69 -1.07 -0.32
1884 1 3.35 3.34 2.34 1.22 2.97 3.16 3.99 3.99 4.17 3.24 3.56 3.24 2.39 3.32 2.93 2.43 2.86
1885 1 -2.15 -2.16 -3.88 -4.91 -2.66 -1.71 -1.44 -1.44 -1.14 -2.01 -1.09 -2.00 -2.87 -2.86 -3.36 -3.98 -2.79
1886 1 -0.96 -0.96 -1.44 -2.95 -1.01 -0.97 -0.41 -0.41 0.19 -0.34 -0.50 -0.22 -1.59 -1.11 -1.51 -2.01 -1.25

7 Format to long

d_air_temp3 <-
  d_air_temp2 %>% 
  pivot_longer(cols = -c(Jahr,, Monat), names_to = "region", values_to = "temp") %>% 
  rename(year = Jahr, month = Monat)

8 More post-processing

d_air_temp3 <-
  d_air_temp3 %>% 
  janitor::clean_names()

9 Save to disk

I’m saving to my personal folder. Your folder path will be different.

write_csv(d_air_temp3, file = "/Users/sebastiansaueruser/github-repos/datasets/csv/air_temp_de_dev_DWD.csv")

save(d_air_temp3, file = "/Users/sebastiansaueruser/github-repos/datasets/Rdata/air_temp_de_dev_DWD.Rdata")

10 Precipitation

11 Debrief

This is nice stuff, however, the temperatures provided are deviations from a long term average, which may be of interest at times, but not so tangible as actual temperatures.

12 Reproducibility

#> ─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.2.0 (2022-04-22)
#>  os       macOS Big Sur/Monterey 10.16
#>  system   x86_64, darwin17.0
#>  ui       X11
#>  language (EN)
#>  collate  en_US.UTF-8
#>  ctype    en_US.UTF-8
#>  tz       Europe/Berlin
#>  date     2022-07-24
#>  pandoc   2.18 @ /usr/local/bin/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────
#>  package     * version date (UTC) lib source
#>  assertthat    0.2.1   2019-03-21 [1] CRAN (R 4.2.0)
#>  backports     1.4.1   2021-12-13 [1] CRAN (R 4.2.0)
#>  blogdown      1.10    2022-05-10 [1] CRAN (R 4.2.0)
#>  bookdown      0.27    2022-06-14 [1] CRAN (R 4.2.0)
#>  brio          1.1.3   2021-11-30 [1] CRAN (R 4.2.0)
#>  broom         1.0.0   2022-07-01 [1] CRAN (R 4.2.0)
#>  bslib         0.3.1   2021-10-06 [1] CRAN (R 4.2.0)
#>  cachem        1.0.6   2021-08-19 [1] CRAN (R 4.2.0)
#>  callr         3.7.0   2021-04-20 [1] CRAN (R 4.2.0)
#>  cellranger    1.1.0   2016-07-27 [1] CRAN (R 4.2.0)
#>  cli           3.3.0   2022-04-25 [1] CRAN (R 4.2.0)
#>  codetools     0.2-18  2020-11-04 [2] CRAN (R 4.2.0)
#>  colorout    * 1.2-2   2022-06-13 [1] local
#>  colorspace    2.0-3   2022-02-21 [1] CRAN (R 4.2.0)
#>  crayon        1.5.1   2022-03-26 [1] CRAN (R 4.2.0)
#>  DBI           1.1.2   2021-12-20 [1] CRAN (R 4.2.0)
#>  dbplyr        2.2.0   2022-06-05 [1] CRAN (R 4.2.0)
#>  desc          1.4.1   2022-03-06 [1] CRAN (R 4.2.0)
#>  devtools      2.4.3   2021-11-30 [1] CRAN (R 4.2.0)
#>  digest        0.6.29  2021-12-01 [1] CRAN (R 4.2.0)
#>  dplyr       * 1.0.9   2022-04-28 [1] CRAN (R 4.2.0)
#>  ellipsis      0.3.2   2021-04-29 [1] CRAN (R 4.2.0)
#>  evaluate      0.15    2022-02-18 [1] CRAN (R 4.2.0)
#>  fansi         1.0.3   2022-03-24 [1] CRAN (R 4.2.0)
#>  fastmap       1.1.0   2021-01-25 [1] CRAN (R 4.2.0)
#>  forcats     * 0.5.1   2021-01-27 [1] CRAN (R 4.2.0)
#>  fs            1.5.2   2021-12-08 [1] CRAN (R 4.2.0)
#>  generics      0.1.3   2022-07-05 [1] CRAN (R 4.2.0)
#>  ggplot2     * 3.3.6   2022-05-03 [1] CRAN (R 4.2.0)
#>  glue          1.6.2   2022-02-24 [1] CRAN (R 4.2.0)
#>  gtable        0.3.0   2019-03-25 [1] CRAN (R 4.2.0)
#>  haven         2.5.0   2022-04-15 [1] CRAN (R 4.2.0)
#>  hms           1.1.1   2021-09-26 [1] CRAN (R 4.2.0)
#>  htmltools     0.5.2   2021-08-25 [1] CRAN (R 4.2.0)
#>  httr          1.4.3   2022-05-04 [1] CRAN (R 4.2.0)
#>  jquerylib     0.1.4   2021-04-26 [1] CRAN (R 4.2.0)
#>  jsonlite      1.8.0   2022-02-22 [1] CRAN (R 4.2.0)
#>  knitr         1.39    2022-04-26 [1] CRAN (R 4.2.0)
#>  lifecycle     1.0.1   2021-09-24 [1] CRAN (R 4.2.0)
#>  lubridate     1.8.0   2021-10-07 [1] CRAN (R 4.2.0)
#>  magrittr      2.0.3   2022-03-30 [1] CRAN (R 4.2.0)
#>  memoise       2.0.1   2021-11-26 [1] CRAN (R 4.2.0)
#>  modelr        0.1.8   2020-05-19 [1] CRAN (R 4.2.0)
#>  munsell       0.5.0   2018-06-12 [1] CRAN (R 4.2.0)
#>  pillar        1.7.0   2022-02-01 [1] CRAN (R 4.2.0)
#>  pkgbuild      1.3.1   2021-12-20 [1] CRAN (R 4.2.0)
#>  pkgconfig     2.0.3   2019-09-22 [1] CRAN (R 4.2.0)
#>  pkgload       1.2.4   2021-11-30 [1] CRAN (R 4.2.0)
#>  prettyunits   1.1.1   2020-01-24 [1] CRAN (R 4.2.0)
#>  processx      3.6.1   2022-06-17 [1] CRAN (R 4.2.0)
#>  ps            1.7.1   2022-06-18 [1] CRAN (R 4.2.0)
#>  purrr       * 0.3.4   2020-04-17 [1] CRAN (R 4.2.0)
#>  R6            2.5.1   2021-08-19 [1] CRAN (R 4.2.0)
#>  readr       * 2.1.2   2022-01-30 [1] CRAN (R 4.2.0)
#>  readxl        1.4.0   2022-03-28 [1] CRAN (R 4.2.0)
#>  remotes       2.4.2   2021-11-30 [1] CRAN (R 4.2.0)
#>  reprex        2.0.1   2021-08-05 [1] CRAN (R 4.2.0)
#>  rlang         1.0.3   2022-06-27 [1] CRAN (R 4.2.0)
#>  rmarkdown     2.14    2022-04-25 [1] CRAN (R 4.2.0)
#>  rprojroot     2.0.3   2022-04-02 [1] CRAN (R 4.2.0)
#>  rstudioapi    0.13    2020-11-12 [1] CRAN (R 4.2.0)
#>  rvest         1.0.2   2021-10-16 [1] CRAN (R 4.2.0)
#>  sass          0.4.1   2022-03-23 [1] CRAN (R 4.2.0)
#>  scales        1.2.0   2022-04-13 [1] CRAN (R 4.2.0)
#>  sessioninfo   1.2.2   2021-12-06 [1] CRAN (R 4.2.0)
#>  stringi       1.7.6   2021-11-29 [1] CRAN (R 4.2.0)
#>  stringr     * 1.4.0   2019-02-10 [1] CRAN (R 4.2.0)
#>  testthat      3.1.4   2022-04-26 [1] CRAN (R 4.2.0)
#>  tibble      * 3.1.7   2022-05-03 [1] CRAN (R 4.2.0)
#>  tidyr       * 1.2.0   2022-02-01 [1] CRAN (R 4.2.0)
#>  tidyselect    1.1.2   2022-02-21 [1] CRAN (R 4.2.0)
#>  tidyverse   * 1.3.1   2021-04-15 [1] CRAN (R 4.2.0)
#>  tzdb          0.3.0   2022-03-28 [1] CRAN (R 4.2.0)
#>  usethis       2.1.6   2022-05-25 [1] CRAN (R 4.2.0)
#>  utf8          1.2.2   2021-07-24 [1] CRAN (R 4.2.0)
#>  vctrs         0.4.1   2022-04-13 [1] CRAN (R 4.2.0)
#>  withr         2.5.0   2022-03-03 [1] CRAN (R 4.2.0)
#>  xfun          0.31    2022-05-10 [1] CRAN (R 4.2.0)
#>  xml2          1.3.3   2021-11-30 [1] CRAN (R 4.2.0)
#>  yaml          2.3.5   2022-02-21 [1] CRAN (R 4.2.0)
#> 
#>  [1] /Users/sebastiansaueruser/Rlibs
#>  [2] /Library/Frameworks/R.framework/Versions/4.2/Resources/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────