1 Load packages
library(tidyverse) # data wrangling
library(glue)
2 Motivation
In this post, we’ll prepare official German weather data. All data are accessed from DWD.
More specifically, the DWD (Deutscher Wetterdienst, German Weather Service) provides an open data section https://opendata.dwd.de/, from where the data were fetched.
3 Licence
The licence can be read here, in short, the data are provided free of charge for everybody’s use.
At this subpage however, it is stated that the materials are copy right protected and that source must be declared like this:
Datenbasis: Deutscher Wetterdienst, eigene Elemente ergänzt.
(Data Source: Deutscher Wetterdiesnt, own elements added).
And, furthermore:
When speaking of spatial data, this also includes any location-related weather and climate information presented on our open web pages.
4 It’s a playful approach
I’m not pretending to do full-fetched meterology research. What’s being done is a rather playful appraoch to derive some basic insights noentheless.
5 Download data
5.1 Air temperature means
Here’s the URL stem, indicating a folder where the data can be found:
url_stem <- "https://opendata.dwd.de/climate_environment/CDC/regional_averages_DE/monthly/air_temperature_mean/"
There’s one file for each month, providing the mean air temperature for the German Bundesländer:
months <- str_pad(1:12, width = 2, side = "left", pad = "0")
months
#> [1] "01" "02" "03" "04" "05" "06" "07" "08" "09" "10" "11" "12"
month_temp_data <- glue("regional_averages_tm_{months}.txt")
month_temp_data
#> regional_averages_tm_01.txt
#> regional_averages_tm_02.txt
#> regional_averages_tm_03.txt
#> regional_averages_tm_04.txt
#> regional_averages_tm_05.txt
#> regional_averages_tm_06.txt
#> regional_averages_tm_07.txt
#> regional_averages_tm_08.txt
#> regional_averages_tm_09.txt
#> regional_averages_tm_10.txt
#> regional_averages_tm_11.txt
#> regional_averages_tm_12.txt
Let’s glue the whole path:
air_temp_path <- paste0(url_stem,month_temp_data)
air_temp_path
#> [1] "https://opendata.dwd.de/climate_environment/CDC/regional_averages_DE/monthly/air_temperature_mean/regional_averages_tm_01.txt"
#> [2] "https://opendata.dwd.de/climate_environment/CDC/regional_averages_DE/monthly/air_temperature_mean/regional_averages_tm_02.txt"
#> [3] "https://opendata.dwd.de/climate_environment/CDC/regional_averages_DE/monthly/air_temperature_mean/regional_averages_tm_03.txt"
#> [4] "https://opendata.dwd.de/climate_environment/CDC/regional_averages_DE/monthly/air_temperature_mean/regional_averages_tm_04.txt"
#> [5] "https://opendata.dwd.de/climate_environment/CDC/regional_averages_DE/monthly/air_temperature_mean/regional_averages_tm_05.txt"
#> [6] "https://opendata.dwd.de/climate_environment/CDC/regional_averages_DE/monthly/air_temperature_mean/regional_averages_tm_06.txt"
#> [7] "https://opendata.dwd.de/climate_environment/CDC/regional_averages_DE/monthly/air_temperature_mean/regional_averages_tm_07.txt"
#> [8] "https://opendata.dwd.de/climate_environment/CDC/regional_averages_DE/monthly/air_temperature_mean/regional_averages_tm_08.txt"
#> [9] "https://opendata.dwd.de/climate_environment/CDC/regional_averages_DE/monthly/air_temperature_mean/regional_averages_tm_09.txt"
#> [10] "https://opendata.dwd.de/climate_environment/CDC/regional_averages_DE/monthly/air_temperature_mean/regional_averages_tm_10.txt"
#> [11] "https://opendata.dwd.de/climate_environment/CDC/regional_averages_DE/monthly/air_temperature_mean/regional_averages_tm_11.txt"
#> [12] "https://opendata.dwd.de/climate_environment/CDC/regional_averages_DE/monthly/air_temperature_mean/regional_averages_tm_12.txt"
6 Download multiple files and bind them together rowwise
First, check:
Here’s the first path:
air_temp_path[1]
#> [1] "https://opendata.dwd.de/climate_environment/CDC/regional_averages_DE/monthly/air_temperature_mean/regional_averages_tm_01.txt"
d <- read_delim(air_temp_path[1], delim = ";", skip = 1)
head(d)
#> # A tibble: 6 × 20
#> Jahr Monat Brandenbur…¹ Brand…² Baden…³ Bayern Hessen Meckl…⁴ Niede…⁵ Niede…⁶
#> <dbl> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 1881 01 " -5.54" " -… " -… " … " … " -… " -… " -…
#> 2 1882 01 " 1.43" " … " -… " … " … " … " … " …
#> 3 1883 01 " -0.41" " -… " -… " … " … " -… " … " …
#> 4 1884 01 " 3.35" " … " … " … " … " … " … " …
#> 5 1885 01 " -2.15" " -… " -… " … " … " -… " -… " -…
#> 6 1886 01 " -0.96" " -… " -… " … " … " -… " -… " -…
#> # … with 10 more variables: `Nordrhein-Westfalen` <chr>,
#> # `Rheinland-Pfalz` <chr>, `Schleswig-Holstein` <chr>, Saarland <chr>,
#> # Sachsen <chr>, `Sachsen-Anhalt` <chr>, `Thueringen/Sachsen-Anhalt` <chr>,
#> # Thueringen <chr>, Deutschland <chr>, ...20 <lgl>, and abbreviated variable
#> # names ¹`Brandenburg/Berlin`, ²Brandenburg, ³`Baden-Wuerttemberg`,
#> # ⁴`Mecklenburg-Vorpommern`, ⁵Niedersachsen, ⁶`Niedersachsen/Hamburg/Bremen`
#> # ℹ Use `colnames()` to see all variable names
However, for this file, some columns are of a different type:
d <- read_delim(air_temp_path[11], delim = ";", skip = 1)
head(d)
#> # A tibble: 6 × 20
#> Jahr Monat Brandenbur…¹ Brand…² Baden…³ Bayern Hessen Meckl…⁴ Niede…⁵ Niede…⁶
#> <dbl> <dbl> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 1881 11 " 6.01" " … " … " … " … " … " … " …
#> 2 1882 11 " 4.01" " … " … " … " … " … " … " …
#> 3 1883 11 " 4.92" " … " … " … " … " … " … " …
#> 4 1884 11 " 1.82" " … " … " … " … " … " … " …
#> 5 1885 11 " 2.47" " … " … " … " … " … " … " …
#> 6 1886 11 " 5.59" " … " … " … " … " … " … " …
#> # … with 10 more variables: `Nordrhein-Westfalen` <chr>,
#> # `Rheinland-Pfalz` <chr>, `Schleswig-Holstein` <chr>, Saarland <chr>,
#> # Sachsen <chr>, `Sachsen-Anhalt` <chr>, `Thueringen/Sachsen-Anhalt` <chr>,
#> # Thueringen <chr>, Deutschland <chr>, ...20 <lgl>, and abbreviated variable
#> # names ¹`Brandenburg/Berlin`, ²Brandenburg, ³`Baden-Wuerttemberg`,
#> # ⁴`Mecklenburg-Vorpommern`, ⁵Niedersachsen, ⁶`Niedersachsen/Hamburg/Bremen`
#> # ℹ Use `colnames()` to see all variable names
See col Monat
.
So let’s better read all cols as character, and post-process the numnbers out of the strings.
d <- read_delim(air_temp_path[11], delim = ";", skip = 1, col_types = cols(.default = "c")) %>% select(-`...20`)
head(d)
#> # A tibble: 6 × 19
#> Jahr Monat Brandenbur…¹ Brand…² Baden…³ Bayern Hessen Meckl…⁴ Niede…⁵ Niede…⁶
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 1881 11 " 6.01" " … " … " … " … " … " … " …
#> 2 1882 11 " 4.01" " … " … " … " … " … " … " …
#> 3 1883 11 " 4.92" " … " … " … " … " … " … " …
#> 4 1884 11 " 1.82" " … " … " … " … " … " … " …
#> 5 1885 11 " 2.47" " … " … " … " … " … " … " …
#> 6 1886 11 " 5.59" " … " … " … " … " … " … " …
#> # … with 9 more variables: `Nordrhein-Westfalen` <chr>,
#> # `Rheinland-Pfalz` <chr>, `Schleswig-Holstein` <chr>, Saarland <chr>,
#> # Sachsen <chr>, `Sachsen-Anhalt` <chr>, `Thueringen/Sachsen-Anhalt` <chr>,
#> # Thueringen <chr>, Deutschland <chr>, and abbreviated variable names
#> # ¹`Brandenburg/Berlin`, ²Brandenburg, ³`Baden-Wuerttemberg`,
#> # ⁴`Mecklenburg-Vorpommern`, ⁵Niedersachsen, ⁶`Niedersachsen/Hamburg/Bremen`
#> # ℹ Use `colnames()` to see all variable names
First, download all files and append to one tibble:
d_air_temp <-
air_temp_path %>%
map_df(~ read_delim(., skip = 1, col_types = cols(.default = "c"))) %>%
select(-`...20`)
Second, let’s parse the numbers out of the strings.
d_air_temp2 <-
d_air_temp %>%
mutate(across(.cols = everything(),
.fns = parse_number))
d_air_temp2 %>%
head() %>%
gt::gt()
Jahr | Monat | Brandenburg/Berlin | Brandenburg | Baden-Wuerttemberg | Bayern | Hessen | Mecklenburg-Vorpommern | Niedersachsen | Niedersachsen/Hamburg/Bremen | Nordrhein-Westfalen | Rheinland-Pfalz | Schleswig-Holstein | Saarland | Sachsen | Sachsen-Anhalt | Thueringen/Sachsen-Anhalt | Thueringen | Deutschland |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1881 | 1 | -5.54 | -5.56 | -4.89 | -6.51 | -5.68 | -5.07 | -4.55 | -4.55 | -4.21 | -4.49 | -4.06 | -4.15 | -6.22 | -5.89 | -6.28 | -6.76 | -5.36 |
1882 | 1 | 1.43 | 1.42 | -0.65 | -1.33 | -0.30 | 1.59 | 1.73 | 1.73 | 1.26 | -0.17 | 2.11 | -0.04 | 0.63 | 1.18 | 0.60 | -0.12 | 0.41 |
1883 | 1 | -0.41 | -0.42 | -0.09 | -1.49 | 0.04 | -0.69 | 0.33 | 0.32 | 1.27 | 0.49 | -0.12 | 0.55 | -1.20 | -0.39 | -0.69 | -1.07 | -0.32 |
1884 | 1 | 3.35 | 3.34 | 2.34 | 1.22 | 2.97 | 3.16 | 3.99 | 3.99 | 4.17 | 3.24 | 3.56 | 3.24 | 2.39 | 3.32 | 2.93 | 2.43 | 2.86 |
1885 | 1 | -2.15 | -2.16 | -3.88 | -4.91 | -2.66 | -1.71 | -1.44 | -1.44 | -1.14 | -2.01 | -1.09 | -2.00 | -2.87 | -2.86 | -3.36 | -3.98 | -2.79 |
1886 | 1 | -0.96 | -0.96 | -1.44 | -2.95 | -1.01 | -0.97 | -0.41 | -0.41 | 0.19 | -0.34 | -0.50 | -0.22 | -1.59 | -1.11 | -1.51 | -2.01 | -1.25 |
7 Format to long
d_air_temp3 <-
d_air_temp2 %>%
pivot_longer(cols = -c(Jahr,, Monat), names_to = "region", values_to = "temp") %>%
rename(year = Jahr, month = Monat)
8 More post-processing
d_air_temp3 <-
d_air_temp3 %>%
janitor::clean_names()
9 Save to disk
I’m saving to my personal folder. Your folder path will be different.
write_csv(d_air_temp3, file = "/Users/sebastiansaueruser/github-repos/datasets/csv/air_temp_de_dev_DWD.csv")
save(d_air_temp3, file = "/Users/sebastiansaueruser/github-repos/datasets/Rdata/air_temp_de_dev_DWD.Rdata")
10 Precipitation
11 Debrief
This is nice stuff, however, the temperatures provided are deviations from a long term average, which may be of interest at times, but not so tangible as actual temperatures.
12 Reproducibility
#> ─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────
#> setting value
#> version R version 4.2.0 (2022-04-22)
#> os macOS Big Sur/Monterey 10.16
#> system x86_64, darwin17.0
#> ui X11
#> language (EN)
#> collate en_US.UTF-8
#> ctype en_US.UTF-8
#> tz Europe/Berlin
#> date 2022-07-24
#> pandoc 2.18 @ /usr/local/bin/ (via rmarkdown)
#>
#> ─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────
#> package * version date (UTC) lib source
#> assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.2.0)
#> backports 1.4.1 2021-12-13 [1] CRAN (R 4.2.0)
#> blogdown 1.10 2022-05-10 [1] CRAN (R 4.2.0)
#> bookdown 0.27 2022-06-14 [1] CRAN (R 4.2.0)
#> brio 1.1.3 2021-11-30 [1] CRAN (R 4.2.0)
#> broom 1.0.0 2022-07-01 [1] CRAN (R 4.2.0)
#> bslib 0.3.1 2021-10-06 [1] CRAN (R 4.2.0)
#> cachem 1.0.6 2021-08-19 [1] CRAN (R 4.2.0)
#> callr 3.7.0 2021-04-20 [1] CRAN (R 4.2.0)
#> cellranger 1.1.0 2016-07-27 [1] CRAN (R 4.2.0)
#> cli 3.3.0 2022-04-25 [1] CRAN (R 4.2.0)
#> codetools 0.2-18 2020-11-04 [2] CRAN (R 4.2.0)
#> colorout * 1.2-2 2022-06-13 [1] local
#> colorspace 2.0-3 2022-02-21 [1] CRAN (R 4.2.0)
#> crayon 1.5.1 2022-03-26 [1] CRAN (R 4.2.0)
#> DBI 1.1.2 2021-12-20 [1] CRAN (R 4.2.0)
#> dbplyr 2.2.0 2022-06-05 [1] CRAN (R 4.2.0)
#> desc 1.4.1 2022-03-06 [1] CRAN (R 4.2.0)
#> devtools 2.4.3 2021-11-30 [1] CRAN (R 4.2.0)
#> digest 0.6.29 2021-12-01 [1] CRAN (R 4.2.0)
#> dplyr * 1.0.9 2022-04-28 [1] CRAN (R 4.2.0)
#> ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.2.0)
#> evaluate 0.15 2022-02-18 [1] CRAN (R 4.2.0)
#> fansi 1.0.3 2022-03-24 [1] CRAN (R 4.2.0)
#> fastmap 1.1.0 2021-01-25 [1] CRAN (R 4.2.0)
#> forcats * 0.5.1 2021-01-27 [1] CRAN (R 4.2.0)
#> fs 1.5.2 2021-12-08 [1] CRAN (R 4.2.0)
#> generics 0.1.3 2022-07-05 [1] CRAN (R 4.2.0)
#> ggplot2 * 3.3.6 2022-05-03 [1] CRAN (R 4.2.0)
#> glue 1.6.2 2022-02-24 [1] CRAN (R 4.2.0)
#> gtable 0.3.0 2019-03-25 [1] CRAN (R 4.2.0)
#> haven 2.5.0 2022-04-15 [1] CRAN (R 4.2.0)
#> hms 1.1.1 2021-09-26 [1] CRAN (R 4.2.0)
#> htmltools 0.5.2 2021-08-25 [1] CRAN (R 4.2.0)
#> httr 1.4.3 2022-05-04 [1] CRAN (R 4.2.0)
#> jquerylib 0.1.4 2021-04-26 [1] CRAN (R 4.2.0)
#> jsonlite 1.8.0 2022-02-22 [1] CRAN (R 4.2.0)
#> knitr 1.39 2022-04-26 [1] CRAN (R 4.2.0)
#> lifecycle 1.0.1 2021-09-24 [1] CRAN (R 4.2.0)
#> lubridate 1.8.0 2021-10-07 [1] CRAN (R 4.2.0)
#> magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.2.0)
#> memoise 2.0.1 2021-11-26 [1] CRAN (R 4.2.0)
#> modelr 0.1.8 2020-05-19 [1] CRAN (R 4.2.0)
#> munsell 0.5.0 2018-06-12 [1] CRAN (R 4.2.0)
#> pillar 1.7.0 2022-02-01 [1] CRAN (R 4.2.0)
#> pkgbuild 1.3.1 2021-12-20 [1] CRAN (R 4.2.0)
#> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.2.0)
#> pkgload 1.2.4 2021-11-30 [1] CRAN (R 4.2.0)
#> prettyunits 1.1.1 2020-01-24 [1] CRAN (R 4.2.0)
#> processx 3.6.1 2022-06-17 [1] CRAN (R 4.2.0)
#> ps 1.7.1 2022-06-18 [1] CRAN (R 4.2.0)
#> purrr * 0.3.4 2020-04-17 [1] CRAN (R 4.2.0)
#> R6 2.5.1 2021-08-19 [1] CRAN (R 4.2.0)
#> readr * 2.1.2 2022-01-30 [1] CRAN (R 4.2.0)
#> readxl 1.4.0 2022-03-28 [1] CRAN (R 4.2.0)
#> remotes 2.4.2 2021-11-30 [1] CRAN (R 4.2.0)
#> reprex 2.0.1 2021-08-05 [1] CRAN (R 4.2.0)
#> rlang 1.0.3 2022-06-27 [1] CRAN (R 4.2.0)
#> rmarkdown 2.14 2022-04-25 [1] CRAN (R 4.2.0)
#> rprojroot 2.0.3 2022-04-02 [1] CRAN (R 4.2.0)
#> rstudioapi 0.13 2020-11-12 [1] CRAN (R 4.2.0)
#> rvest 1.0.2 2021-10-16 [1] CRAN (R 4.2.0)
#> sass 0.4.1 2022-03-23 [1] CRAN (R 4.2.0)
#> scales 1.2.0 2022-04-13 [1] CRAN (R 4.2.0)
#> sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.2.0)
#> stringi 1.7.6 2021-11-29 [1] CRAN (R 4.2.0)
#> stringr * 1.4.0 2019-02-10 [1] CRAN (R 4.2.0)
#> testthat 3.1.4 2022-04-26 [1] CRAN (R 4.2.0)
#> tibble * 3.1.7 2022-05-03 [1] CRAN (R 4.2.0)
#> tidyr * 1.2.0 2022-02-01 [1] CRAN (R 4.2.0)
#> tidyselect 1.1.2 2022-02-21 [1] CRAN (R 4.2.0)
#> tidyverse * 1.3.1 2021-04-15 [1] CRAN (R 4.2.0)
#> tzdb 0.3.0 2022-03-28 [1] CRAN (R 4.2.0)
#> usethis 2.1.6 2022-05-25 [1] CRAN (R 4.2.0)
#> utf8 1.2.2 2021-07-24 [1] CRAN (R 4.2.0)
#> vctrs 0.4.1 2022-04-13 [1] CRAN (R 4.2.0)
#> withr 2.5.0 2022-03-03 [1] CRAN (R 4.2.0)
#> xfun 0.31 2022-05-10 [1] CRAN (R 4.2.0)
#> xml2 1.3.3 2021-11-30 [1] CRAN (R 4.2.0)
#> yaml 2.3.5 2022-02-21 [1] CRAN (R 4.2.0)
#>
#> [1] /Users/sebastiansaueruser/Rlibs
#> [2] /Library/Frameworks/R.framework/Versions/4.2/Resources/library
#>
#> ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────