1 Load packages
2 Motivation
3 Licence
4 It’s a playful approach
5 Download data
- 5.1 Air temperature means
6 Download multiple files and bind them together rowwise
7 Format to long
8 More post-processing
9 Save to disk
10 Precipitation
11 Debrief
12 Reproducibility

1 Load packages

library(tidyverse)  # data wrangling
library(glue)

2 Motivation

In this post, we’ll prepare official German weather data. All data are accessed from DWD.

More specifically, the DWD (Deutscher Wetterdienst, German Weather Service) provides an open data section https://opendata.dwd.de/, from where the data were fetched.

3 Licence

The licence can be read here, in short, the data are provided free of charge for everybody’s use.

At this subpage however, it is stated that the materials are copy right protected and that source must be declared like this:

Datenbasis: Deutscher Wetterdienst, eigene Elemente ergänzt.

(Data Source: Deutscher Wetterdiesnt, own elements added).

And, furthermore:

When speaking of spatial data, this also includes any location-related weather and climate information presented on our open web pages.

4 It’s a playful approach

I’m not pretending to do full-fetched meterology research. What’s being done is a rather playful appraoch to derive some basic insights noentheless.

5 Download data

5.1 Air temperature means

Here’s the URL stem, indicating a folder where the data can be found:

url_stem <- "https://opendata.dwd.de/climate_environment/CDC/regional_averages_DE/monthly/air_temperature_mean/"

There’s one file for each month, providing the mean air temperature for the German Bundesländer:

months <- str_pad(1:12, width = 2, side = "left", pad = "0")
months
#>  [1] "01" "02" "03" "04" "05" "06" "07" "08" "09" "10" "11" "12"
month_temp_data <- glue("regional_averages_tm_{months}.txt")
month_temp_data
#> regional_averages_tm_01.txt
#> regional_averages_tm_02.txt
#> regional_averages_tm_03.txt
#> regional_averages_tm_04.txt
#> regional_averages_tm_05.txt
#> regional_averages_tm_06.txt
#> regional_averages_tm_07.txt
#> regional_averages_tm_08.txt
#> regional_averages_tm_09.txt
#> regional_averages_tm_10.txt
#> regional_averages_tm_11.txt
#> regional_averages_tm_12.txt

Let’s glue the whole path:

air_temp_path <- paste0(url_stem,month_temp_data)
air_temp_path
#>  [1] "https://opendata.dwd.de/climate_environment/CDC/regional_averages_DE/monthly/air_temperature_mean/regional_averages_tm_01.txt"
#>  [2] "https://opendata.dwd.de/climate_environment/CDC/regional_averages_DE/monthly/air_temperature_mean/regional_averages_tm_02.txt"
#>  [3] "https://opendata.dwd.de/climate_environment/CDC/regional_averages_DE/monthly/air_temperature_mean/regional_averages_tm_03.txt"
#>  [4] "https://opendata.dwd.de/climate_environment/CDC/regional_averages_DE/monthly/air_temperature_mean/regional_averages_tm_04.txt"
#>  [5] "https://opendata.dwd.de/climate_environment/CDC/regional_averages_DE/monthly/air_temperature_mean/regional_averages_tm_05.txt"
#>  [6] "https://opendata.dwd.de/climate_environment/CDC/regional_averages_DE/monthly/air_temperature_mean/regional_averages_tm_06.txt"
#>  [7] "https://opendata.dwd.de/climate_environment/CDC/regional_averages_DE/monthly/air_temperature_mean/regional_averages_tm_07.txt"
#>  [8] "https://opendata.dwd.de/climate_environment/CDC/regional_averages_DE/monthly/air_temperature_mean/regional_averages_tm_08.txt"
#>  [9] "https://opendata.dwd.de/climate_environment/CDC/regional_averages_DE/monthly/air_temperature_mean/regional_averages_tm_09.txt"
#> [10] "https://opendata.dwd.de/climate_environment/CDC/regional_averages_DE/monthly/air_temperature_mean/regional_averages_tm_10.txt"
#> [11] "https://opendata.dwd.de/climate_environment/CDC/regional_averages_DE/monthly/air_temperature_mean/regional_averages_tm_11.txt"
#> [12] "https://opendata.dwd.de/climate_environment/CDC/regional_averages_DE/monthly/air_temperature_mean/regional_averages_tm_12.txt"

6 Download multiple files and bind them together rowwise

First, check:

Here’s the first path:

air_temp_path[1]
#> [1] "https://opendata.dwd.de/climate_environment/CDC/regional_averages_DE/monthly/air_temperature_mean/regional_averages_tm_01.txt"

d <- read_delim(air_temp_path[1], delim = ";", skip = 1)
head(d)
#> # A tibble: 6 × 20
#>    Jahr Monat Brandenbur…¹ Brand…² Baden…³ Bayern Hessen Meckl…⁴ Niede…⁵ Niede…⁶
#>   <dbl> <chr> <chr>        <chr>   <chr>   <chr>  <chr>  <chr>   <chr>   <chr>  
#> 1  1881 01    "    -5.54"  "    -… "    -… "    … "    … "    -… "    -… "    -…
#> 2  1882 01    "     1.43"  "     … "    -… "    … "    … "     … "     … "     …
#> 3  1883 01    "    -0.41"  "    -… "    -… "    … "    … "    -… "     … "     …
#> 4  1884 01    "     3.35"  "     … "     … "    … "    … "     … "     … "     …
#> 5  1885 01    "    -2.15"  "    -… "    -… "    … "    … "    -… "    -… "    -…
#> 6  1886 01    "    -0.96"  "    -… "    -… "    … "    … "    -… "    -… "    -…
#> # … with 10 more variables: `Nordrhein-Westfalen` <chr>,
#> #   `Rheinland-Pfalz` <chr>, `Schleswig-Holstein` <chr>, Saarland <chr>,
#> #   Sachsen <chr>, `Sachsen-Anhalt` <chr>, `Thueringen/Sachsen-Anhalt` <chr>,
#> #   Thueringen <chr>, Deutschland <chr>, ...20 <lgl>, and abbreviated variable
#> #   names ¹`Brandenburg/Berlin`, ²Brandenburg, ³`Baden-Wuerttemberg`,
#> #   ⁴`Mecklenburg-Vorpommern`, ⁵Niedersachsen, ⁶`Niedersachsen/Hamburg/Bremen`
#> # ℹ Use `colnames()` to see all variable names

However, for this file, some columns are of a different type:

d <- read_delim(air_temp_path[11], delim = ";", skip = 1)
head(d)
#> # A tibble: 6 × 20
#>    Jahr Monat Brandenbur…¹ Brand…² Baden…³ Bayern Hessen Meckl…⁴ Niede…⁵ Niede…⁶
#>   <dbl> <dbl> <chr>        <chr>   <chr>   <chr>  <chr>  <chr>   <chr>   <chr>  
#> 1  1881    11 "     6.01"  "     … "     … "    … "    … "     … "     … "     …
#> 2  1882    11 "     4.01"  "     … "     … "    … "    … "     … "     … "     …
#> 3  1883    11 "     4.92"  "     … "     … "    … "    … "     … "     … "     …
#> 4  1884    11 "     1.82"  "     … "     … "    … "    … "     … "     … "     …
#> 5  1885    11 "     2.47"  "     … "     … "    … "    … "     … "     … "     …
#> 6  1886    11 "     5.59"  "     … "     … "    … "    … "     … "     … "     …
#> # … with 10 more variables: `Nordrhein-Westfalen` <chr>,
#> #   `Rheinland-Pfalz` <chr>, `Schleswig-Holstein` <chr>, Saarland <chr>,
#> #   Sachsen <chr>, `Sachsen-Anhalt` <chr>, `Thueringen/Sachsen-Anhalt` <chr>,
#> #   Thueringen <chr>, Deutschland <chr>, ...20 <lgl>, and abbreviated variable
#> #   names ¹`Brandenburg/Berlin`, ²Brandenburg, ³`Baden-Wuerttemberg`,
#> #   ⁴`Mecklenburg-Vorpommern`, ⁵Niedersachsen, ⁶`Niedersachsen/Hamburg/Bremen`
#> # ℹ Use `colnames()` to see all variable names

See col Monat.

So let’s better read all cols as character, and post-process the numnbers out of the strings.

d <- read_delim(air_temp_path[11], delim = ";", skip = 1, col_types = cols(.default = "c")) %>% select(-`...20`)

head(d)
#> # A tibble: 6 × 19
#>   Jahr  Monat Brandenbur…¹ Brand…² Baden…³ Bayern Hessen Meckl…⁴ Niede…⁵ Niede…⁶
#>   <chr> <chr> <chr>        <chr>   <chr>   <chr>  <chr>  <chr>   <chr>   <chr>  
#> 1 1881  11    "     6.01"  "     … "     … "    … "    … "     … "     … "     …
#> 2 1882  11    "     4.01"  "     … "     … "    … "    … "     … "     … "     …
#> 3 1883  11    "     4.92"  "     … "     … "    … "    … "     … "     … "     …
#> 4 1884  11    "     1.82"  "     … "     … "    … "    … "     … "     … "     …
#> 5 1885  11    "     2.47"  "     … "     … "    … "    … "     … "     … "     …
#> 6 1886  11    "     5.59"  "     … "     … "    … "    … "     … "     … "     …
#> # … with 9 more variables: `Nordrhein-Westfalen` <chr>,
#> #   `Rheinland-Pfalz` <chr>, `Schleswig-Holstein` <chr>, Saarland <chr>,
#> #   Sachsen <chr>, `Sachsen-Anhalt` <chr>, `Thueringen/Sachsen-Anhalt` <chr>,
#> #   Thueringen <chr>, Deutschland <chr>, and abbreviated variable names
#> #   ¹`Brandenburg/Berlin`, ²Brandenburg, ³`Baden-Wuerttemberg`,
#> #   ⁴`Mecklenburg-Vorpommern`, ⁵Niedersachsen, ⁶`Niedersachsen/Hamburg/Bremen`
#> # ℹ Use `colnames()` to see all variable names

First, download all files and append to one tibble:

d_air_temp <- 
  air_temp_path %>% 
  map_df(~ read_delim(., skip = 1, col_types = cols(.default = "c"))) %>% 
  select(-`...20`)

Second, let’s parse the numbers out of the strings.

d_air_temp2 <-
  d_air_temp %>% 
  mutate(across(.cols = everything(),
                 .fns = parse_number))

d_air_temp2 %>% 
  head() %>% 
  gt::gt()

Jahr	Monat	Brandenburg/Berlin	Brandenburg	Baden-Wuerttemberg	Bayern	Hessen	Mecklenburg-Vorpommern	Niedersachsen	Niedersachsen/Hamburg/Bremen	Nordrhein-Westfalen	Rheinland-Pfalz	Schleswig-Holstein	Saarland	Sachsen	Sachsen-Anhalt	Thueringen/Sachsen-Anhalt	Thueringen	Deutschland
1881	1	-5.54	-5.56	-4.89	-6.51	-5.68	-5.07	-4.55	-4.55	-4.21	-4.49	-4.06	-4.15	-6.22	-5.89	-6.28	-6.76	-5.36
1882	1	1.43	1.42	-0.65	-1.33	-0.30	1.59	1.73	1.73	1.26	-0.17	2.11	-0.04	0.63	1.18	0.60	-0.12	0.41
1883	1	-0.41	-0.42	-0.09	-1.49	0.04	-0.69	0.33	0.32	1.27	0.49	-0.12	0.55	-1.20	-0.39	-0.69	-1.07	-0.32
1884	1	3.35	3.34	2.34	1.22	2.97	3.16	3.99	3.99	4.17	3.24	3.56	3.24	2.39	3.32	2.93	2.43	2.86
1885	1	-2.15	-2.16	-3.88	-4.91	-2.66	-1.71	-1.44	-1.44	-1.14	-2.01	-1.09	-2.00	-2.87	-2.86	-3.36	-3.98	-2.79
1886	1	-0.96	-0.96	-1.44	-2.95	-1.01	-0.97	-0.41	-0.41	0.19	-0.34	-0.50	-0.22	-1.59	-1.11	-1.51	-2.01	-1.25

7 Format to long

d_air_temp3 <-
  d_air_temp2 %>% 
  pivot_longer(cols = -c(Jahr,, Monat), names_to = "region", values_to = "temp") %>% 
  rename(year = Jahr, month = Monat)

8 More post-processing

d_air_temp3 <-
  d_air_temp3 %>% 
  janitor::clean_names()

9 Save to disk

I’m saving to my personal folder. Your folder path will be different.

write_csv(d_air_temp3, file = "/Users/sebastiansaueruser/github-repos/datasets/csv/air_temp_de_dev_DWD.csv")

save(d_air_temp3, file = "/Users/sebastiansaueruser/github-repos/datasets/Rdata/air_temp_de_dev_DWD.Rdata")

10 Precipitation

11 Debrief

This is nice stuff, however, the temperatures provided are deviations from a long term average, which may be of interest at times, but not so tangible as actual temperatures.

12 Reproducibility

#> ─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.2.0 (2022-04-22)
#>  os       macOS Big Sur/Monterey 10.16
#>  system   x86_64, darwin17.0
#>  ui       X11
#>  language (EN)
#>  collate  en_US.UTF-8
#>  ctype    en_US.UTF-8
#>  tz       Europe/Berlin
#>  date     2022-07-24
#>  pandoc   2.18 @ /usr/local/bin/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────
#>  package     * version date (UTC) lib source
#>  assertthat    0.2.1   2019-03-21 [1] CRAN (R 4.2.0)
#>  backports     1.4.1   2021-12-13 [1] CRAN (R 4.2.0)
#>  blogdown      1.10    2022-05-10 [1] CRAN (R 4.2.0)
#>  bookdown      0.27    2022-06-14 [1] CRAN (R 4.2.0)
#>  brio          1.1.3   2021-11-30 [1] CRAN (R 4.2.0)
#>  broom         1.0.0   2022-07-01 [1] CRAN (R 4.2.0)
#>  bslib         0.3.1   2021-10-06 [1] CRAN (R 4.2.0)
#>  cachem        1.0.6   2021-08-19 [1] CRAN (R 4.2.0)
#>  callr         3.7.0   2021-04-20 [1] CRAN (R 4.2.0)
#>  cellranger    1.1.0   2016-07-27 [1] CRAN (R 4.2.0)
#>  cli           3.3.0   2022-04-25 [1] CRAN (R 4.2.0)
#>  codetools     0.2-18  2020-11-04 [2] CRAN (R 4.2.0)
#>  colorout    * 1.2-2   2022-06-13 [1] local
#>  colorspace    2.0-3   2022-02-21 [1] CRAN (R 4.2.0)
#>  crayon        1.5.1   2022-03-26 [1] CRAN (R 4.2.0)
#>  DBI           1.1.2   2021-12-20 [1] CRAN (R 4.2.0)
#>  dbplyr        2.2.0   2022-06-05 [1] CRAN (R 4.2.0)
#>  desc          1.4.1   2022-03-06 [1] CRAN (R 4.2.0)
#>  devtools      2.4.3   2021-11-30 [1] CRAN (R 4.2.0)
#>  digest        0.6.29  2021-12-01 [1] CRAN (R 4.2.0)
#>  dplyr       * 1.0.9   2022-04-28 [1] CRAN (R 4.2.0)
#>  ellipsis      0.3.2   2021-04-29 [1] CRAN (R 4.2.0)
#>  evaluate      0.15    2022-02-18 [1] CRAN (R 4.2.0)
#>  fansi         1.0.3   2022-03-24 [1] CRAN (R 4.2.0)
#>  fastmap       1.1.0   2021-01-25 [1] CRAN (R 4.2.0)
#>  forcats     * 0.5.1   2021-01-27 [1] CRAN (R 4.2.0)
#>  fs            1.5.2   2021-12-08 [1] CRAN (R 4.2.0)
#>  generics      0.1.3   2022-07-05 [1] CRAN (R 4.2.0)
#>  ggplot2     * 3.3.6   2022-05-03 [1] CRAN (R 4.2.0)
#>  glue          1.6.2   2022-02-24 [1] CRAN (R 4.2.0)
#>  gtable        0.3.0   2019-03-25 [1] CRAN (R 4.2.0)
#>  haven         2.5.0   2022-04-15 [1] CRAN (R 4.2.0)
#>  hms           1.1.1   2021-09-26 [1] CRAN (R 4.2.0)
#>  htmltools     0.5.2   2021-08-25 [1] CRAN (R 4.2.0)
#>  httr          1.4.3   2022-05-04 [1] CRAN (R 4.2.0)
#>  jquerylib     0.1.4   2021-04-26 [1] CRAN (R 4.2.0)
#>  jsonlite      1.8.0   2022-02-22 [1] CRAN (R 4.2.0)
#>  knitr         1.39    2022-04-26 [1] CRAN (R 4.2.0)
#>  lifecycle     1.0.1   2021-09-24 [1] CRAN (R 4.2.0)
#>  lubridate     1.8.0   2021-10-07 [1] CRAN (R 4.2.0)
#>  magrittr      2.0.3   2022-03-30 [1] CRAN (R 4.2.0)
#>  memoise       2.0.1   2021-11-26 [1] CRAN (R 4.2.0)
#>  modelr        0.1.8   2020-05-19 [1] CRAN (R 4.2.0)
#>  munsell       0.5.0   2018-06-12 [1] CRAN (R 4.2.0)
#>  pillar        1.7.0   2022-02-01 [1] CRAN (R 4.2.0)
#>  pkgbuild      1.3.1   2021-12-20 [1] CRAN (R 4.2.0)
#>  pkgconfig     2.0.3   2019-09-22 [1] CRAN (R 4.2.0)
#>  pkgload       1.2.4   2021-11-30 [1] CRAN (R 4.2.0)
#>  prettyunits   1.1.1   2020-01-24 [1] CRAN (R 4.2.0)
#>  processx      3.6.1   2022-06-17 [1] CRAN (R 4.2.0)
#>  ps            1.7.1   2022-06-18 [1] CRAN (R 4.2.0)
#>  purrr       * 0.3.4   2020-04-17 [1] CRAN (R 4.2.0)
#>  R6            2.5.1   2021-08-19 [1] CRAN (R 4.2.0)
#>  readr       * 2.1.2   2022-01-30 [1] CRAN (R 4.2.0)
#>  readxl        1.4.0   2022-03-28 [1] CRAN (R 4.2.0)
#>  remotes       2.4.2   2021-11-30 [1] CRAN (R 4.2.0)
#>  reprex        2.0.1   2021-08-05 [1] CRAN (R 4.2.0)
#>  rlang         1.0.3   2022-06-27 [1] CRAN (R 4.2.0)
#>  rmarkdown     2.14    2022-04-25 [1] CRAN (R 4.2.0)
#>  rprojroot     2.0.3   2022-04-02 [1] CRAN (R 4.2.0)
#>  rstudioapi    0.13    2020-11-12 [1] CRAN (R 4.2.0)
#>  rvest         1.0.2   2021-10-16 [1] CRAN (R 4.2.0)
#>  sass          0.4.1   2022-03-23 [1] CRAN (R 4.2.0)
#>  scales        1.2.0   2022-04-13 [1] CRAN (R 4.2.0)
#>  sessioninfo   1.2.2   2021-12-06 [1] CRAN (R 4.2.0)
#>  stringi       1.7.6   2021-11-29 [1] CRAN (R 4.2.0)
#>  stringr     * 1.4.0   2019-02-10 [1] CRAN (R 4.2.0)
#>  testthat      3.1.4   2022-04-26 [1] CRAN (R 4.2.0)
#>  tibble      * 3.1.7   2022-05-03 [1] CRAN (R 4.2.0)
#>  tidyr       * 1.2.0   2022-02-01 [1] CRAN (R 4.2.0)
#>  tidyselect    1.1.2   2022-02-21 [1] CRAN (R 4.2.0)
#>  tidyverse   * 1.3.1   2021-04-15 [1] CRAN (R 4.2.0)
#>  tzdb          0.3.0   2022-03-28 [1] CRAN (R 4.2.0)
#>  usethis       2.1.6   2022-05-25 [1] CRAN (R 4.2.0)
#>  utf8          1.2.2   2021-07-24 [1] CRAN (R 4.2.0)
#>  vctrs         0.4.1   2022-04-13 [1] CRAN (R 4.2.0)
#>  withr         2.5.0   2022-03-03 [1] CRAN (R 4.2.0)
#>  xfun          0.31    2022-05-10 [1] CRAN (R 4.2.0)
#>  xml2          1.3.3   2021-11-30 [1] CRAN (R 4.2.0)
#>  yaml          2.3.5   2022-02-21 [1] CRAN (R 4.2.0)
#> 
#>  [1] /Users/sebastiansaueruser/Rlibs
#>  [2] /Library/Frameworks/R.framework/Versions/4.2/Resources/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

Preparing German weather data