1 Load packages

2 Course description

Models and modeling are of pivotal importance in many sciences, not only for providing an explanation of nature en miniature (theoretical models), but also for gauging how closely the empirical data at hand match the theoretical model. Translating a theoretical model into statistical language is called statistical modeling and provides the guiding principle in this introductory course. Regression models will be presented as a lingua franca of statistical modeling, and we will learn that many empirical questions can (comfortably) be analyzed using a regression framework. Depending on the background and aims of the participants (and time permitting), we will shed light on some standard topics such as model comparison, classification models, and typical pitfalls. Given a more advanced auditorium, we will explore how causal and non-causal associations can be translated and tested using simple linear statistical models. Foundational ideas of statistical modeling will be accompanied by short examples and case studies to facilitate transfer and practical application after the course. Time permitting, modern methods of statistical inference will be introduced in a intuitive, rather non-mathematical way. We will use the R environment for all computations (freely available). Please bring your own Laptop with R and RStudio installed (installation guides are provided). Data and R code will be provided.

3 Course prerequisites

Basic computer usage knowledge is needed (downloading materials from the internet, operating a PC, etc). Basic R knowledge is helpful. Basic knowledge of statistical concepts (such as descriptive statistics) is helpful. Willingness to learn is essential.

4 Learning objectives

Upon successful completion of this course, students should be able to:

select the right statistical visualization for a variety of data contexts
“crunch” or “wrangle” data
explain what statistical modelling means
formulate basic statistical models
differentiate between predictive and explanatory modelling
apply the methods to own datasets

5 Course Literature

This course builds on the freely available e-book ModernDive. Each topic is paralleled by an ackompagnying chapter from ModernDive. A hard copy can be purchased here. The book is for sale in print here.

6 Course logistics

This course can be presented as a one-day seminar or split-up in four blocks.

The course can be held in English or German.

Please bring your own computer and read the notes regarding course logistics in advance. Note that some upfront preparation is needed from the learners.

R and RStudio¹ will be needed throughout the course. Please make sure that the IT is running. In case of technical difficulties with R feel free to use RStudio Cloud; free plans are available.

All learning materials (such as literature, code, data) will be provided in electronic format.

7 UPFRONT student preparation

Install R and RStudio, see ModernDive Chap. 1.1. In case you have your R running on your system, please make sure that you’re uptodate. If outdated, download and install the most recent versions of the software. Similarly, hit the “Update” button in RStudio’s “Packages” tab to update your packages if you have not done so for a couple of months.
Students new to R are advised to learn the basics, see ModernDive, Chap 1.2 - 1.5
Sign-up to RStudio Cloud
Bring your own laptop
- With R and RStudio installed
- Browse to the textbook website for this course
Make sure your internet connection is stable and your loudspeaker/headset is working; a webcam is helpful.
Students are advised to review the course materials after each session.
I recommend that you carefully check the course description to make sure the course fits your needs (not too advanced/basic).

8 Didactic outline

This course can rather be considered a workshop in the sense that the instructor uses a dialogue-based approach to teaching and that there are numerous exercises during the course. Instead of providing long talks to the students, the instructor feels obligated to engage students in back-and-forth conversations. Similarly, the presentation of a large number of Powerpoint slide is avoided. Instead, a thorough course literature is available (free online), so that students will have no barrier in diving deeply into the materials and ideas presented. However, during class it is more important to transmit the pivotal ideas; details need to be read and worked by the students individually after (and before) the course. As an alternative to presenting a lot of text on slides, in this course there will be a (electronic) whiteboard where concepts are developed dynamically and in pace of the teaching conversation thereby adjusting the “dose” of new thoughts to the actual pace of the instruction.

9 Schedule

9.1 Overview on topics covered

Data Visualization using the grammar of graphics and ggplot2
Data Wrangling based on the tidyverse in R
Basic concepts of statistical modelling
Primer on causal inference
Introduction to regression analysis

9.2 Block 1: Explorative Data Analysis

9.2.1 Visualization

Data visualization, see ModernDive Chap. 2, and get the R code here
- Exploring common types of statistical diagrams, the “5NG”
- Discussing when (not) to use diagrams see Anscombe’s Quartett, and when to use which one
- Building elegant graphics in R

9.2.2 Data Wrangling

Data wrangling, see ModernDive Chap. 3, and get the R code here
- A taxonomy of typical data operations
- How to perform common data operations with R
- Summarizing data (aka computing descriptive statistics)

9.2.3 Exercises / Case study

Exercises
- Exercises on life expectancy.
- Case study on the visualization of flight delays
- Advanced case study on one hit wonders
- Visualization covid cases
- Case study on nominal data: Survival on the Titanic
- Inspiration for own project: Visualize Covid-19 cases from this source.

9.3 Block 2: Statistical Modelling: Basic

9.3.1 Theory

Basics of modelling, see ModernDive Chap. 5.0, and get the R code here
- What is modelling?
- Basic terminology
- Prediction vs. explanation
Some thoughts on causal inference, see ModernDive Chap. 5.3.1
Regression with one numerical predictor, see ModernDive Chap. 5.1
Regression with one categorical predictor, see ModernDive Chap. 5.2
Assessing model fit (using (adjusted) $R^{2}$ ), see ModernDive Chap. 5.3.2
For some tips and tricks on typical issues, see ModernDive tips and tricks

9.3.2 Case study

Exercises/Case studies:
- Prices of Boston houses, first part
- Modeling movie succes, first part

9.4 Block 3: Statistical Modelling: Multiple Regression and interaction

9.4.1 Theory

Slightly more advanced topics on linear regression such as multiple regression and interaction, see ModernDive Chap. 6, and get the R code here
One numerical and one categorical predictor, see ModernDive Chap. 6.1
Two numerical predictors, see ModernDive Chap. 6.2
Simpson’s paradox and more on causal inference, see ModernDive Chap. 6.3.3

9.4.2 Case study

Exercises/Case studies:
- Prices of Boston houses, second part
- Modeling movie succes, second part
- Modeling flight delays

9.5 Block 4: Project coaching

This session is dedicated to work on real projects brought in by the students.
In addition, open questions regarding the presented concepts are being discussed.

10 Instructor

Sebastian Sauer works as a professor for Business Psychology at Ansbach university. Analyzing data to answer questions related to social phenomena is one of his major interests. The programming language “R” is one of his favorite tools. He sees himself as a learner, and is particularly interested in statistical methods. Open Science is a hot topic to him. He hopes to contribute to pressing social problems such as populism by bringing in his statistical and psychological know-how. He writes a blog which serves as a sketchpad for stuff in his mind (not immune to thought updates) at https://data-se.netlify.app/. Sebastian is the author of “Moderne Datenanalyse mit R” [@sauer_moderne_2019]. His publication list is available on Google Scholar.

11 Contact me

Feel free to contact me via email at sebastiansauer1@gmail.com.

12 Assessment and grades

There is no assessment, there are no grades!

13 Talk to me

It’s my goal to make this an excellent course and a stimulating and enjoyable experience for all of us. So that I can find out if this is happening, I encourage feedback—be it positive or negative—on all aspects of the course at any time. For example, if something I’m doing is making it difficult for you to learn, then let me know before it’s too late; if you particularly enjoyed something we did in class, say so so that we can do it again.

14 Course materials

Most of the materials as presented below is made available through the course book ModernDive. Please check the relevant chapters of the book before the course to make sure you have all materials available.

15 More resources

15.1 Recommendations

RStudio Cheatsheets, particularly on data wrangling, and data vizualization
Book R for Data Science as a handy reference or a serious text book.
Tidy Tuesday video series
Post your open question on Stack Overflow.
Follow #rstats hashtag on Twitter.

For students willing to learn more and go deeper (than the concepts explored in the present course), this book on regression modelling, and this book on statistical learning are recommended. For German folks, check out my book on modern data analysis.

Suggested literature for deepening the analytic skills include Statistical Rethinking. For an introduction to graphical causal models, check out Julia Rohrer’s paper. For a more in-depth journey, consider reading this book. While I wholeheartedly recommend such books, we will not be able to discuss many of the ideas presented therein in class (in this course) due to time constraints.

15.2 R Packages

All R packages are accessible through the course book; please consult the relevant chapters. Please install all R packages used before the course. Here’s a tutorial on how to install R packages.

The most imporatant R packages for this course are:

tidyverse
gapminder
moderndive
nycflights13

15.3 Data

All data are accessible through the course book; please consult the relevant chapters.

15.4 Labs (case studies)

Practical data analysis skills can be practiced using these labs; in addition Chapter 11 provides two cases studies. Note that such content may be used as homework.

There are a lot of case studies scatter on the internet.

15.5 Sketching causal models

Dagitty is great tool for sketching causal graphs (DAGs), it can be usd in your browser or as R package. Here’s an example of a collider bias. Check out this post for an intuitive explanation.

16 Where are the slides?

There are none. I feel that slides are not optimal for learning. In class, slides can be detrimental if they are too wordy because that distracts from that the dialogue with the instructor, and I hold this very dialogue as essential. Outside of class, slides are neither helpful. Instead, a good book is much more beneficial, because in a book, there’s enough room to patiently explain in sufficient details, an endeavor which is impossible for a slide deck.

To underline my messages to you, dear learners, I will use some sketches on a virtual whiteboard, some interactive apps, live coding, and some (pre-prepared) diagrams. That’s a bit similar to what happens at Khan Academy. You might have noticed that many courses at Coursera follow a similar approach.

I readily confess that this approach is novel to many learners in these days, learners who are accustomed to hundreds of Powerpoint slides. Please be open and I think you will appreciate this didactic style.

Desktop version, not the server↩︎

Introduction to Statistics: A modeling-based approach -- Course Syllabus