- 1 Load packages
- 2 Course description
- 3 Course prerequisites
- 4 Learning objectives
- 5 Course Literature
- 6 Course logistics
- 7 UPFRONT student preparation
- 8 Didactic outline
- 9 Schedule
- 10 Instructor
- 11 Contact me
- 12 Assessment and grades
- 13 Talk to me
- 14 Course materials
- 15 More resources
- 16 Where are the slides?
1 Load packages
2 Course description
Models and modeling are of pivotal importance in many sciences, not only for providing an explanation of nature en miniature (theoretical models), but also for gauging how closely the empirical data at hand match the theoretical model. Translating a theoretical model into statistical language is called statistical modeling and provides the guiding principle in this introductory course. Regression models will be presented as a lingua franca of statistical modeling, and we will learn that many empirical questions can (comfortably) be analyzed using a regression framework. Depending on the background and aims of the participants (and time permitting), we will shed light on some standard topics such as model comparison, classification models, and typical pitfalls. Given a more advanced auditorium, we will explore how causal and non-causal associations can be translated and tested using simple linear statistical models. Foundational ideas of statistical modeling will be accompanied by short examples and case studies to facilitate transfer and practical application after the course. Time permitting, modern methods of statistical inference will be introduced in a intuitive, rather non-mathematical way. We will use the R environment for all computations (freely available). Please bring your own Laptop with R and RStudio installed (installation guides are provided). Data and R code will be provided.
3 Course prerequisites
Basic computer usage knowledge is needed (downloading materials from the internet, operating a PC, etc). Basic R knowledge is helpful. Basic knowledge of statistical concepts (such as descriptive statistics) is helpful. Willingness to learn is essential.
4 Learning objectives
Upon successful completion of this course, students should be able to:
- select the right statistical visualization for a variety of data contexts
- “crunch” or “wrangle” data
- explain what statistical modelling means
- formulate basic statistical models
- differentiate between predictive and explanatory modelling
- apply the methods to own datasets
5 Course Literature
This course builds on the freely available e-book ModernDive. Each topic is paralleled by an ackompagnying chapter from ModernDive. A hard copy can be purchased here. The book is for sale in print here.
6 Course logistics
This course can be presented as a one-day seminar or split-up in four blocks.
The course can be held in English or German.
Please bring your own computer and read the notes regarding course logistics in advance. Note that some upfront preparation is needed from the learners.
R and RStudio1 will be needed throughout the course. Please make sure that the IT is running. In case of technical difficulties with R feel free to use RStudio Cloud; free plans are available.
All learning materials (such as literature, code, data) will be provided in electronic format.
7 UPFRONT student preparation
- Install R and RStudio, see ModernDive Chap. 1.1. In case you have your R running on your system, please make sure that you’re uptodate. If outdated, download and install the most recent versions of the software. Similarly, hit the “Update” button in RStudio’s “Packages” tab to update your packages if you have not done so for a couple of months.
- Students new to R are advised to learn the basics, see ModernDive, Chap 1.2 - 1.5
- Sign-up to RStudio Cloud
- Bring your own laptop
- With R and RStudio installed
- Browse to the textbook website for this course
- Make sure your internet connection is stable and your loudspeaker/headset is working; a webcam is helpful.
- Students are advised to review the course materials after each session.
- I recommend that you carefully check the course description to make sure the course fits your needs (not too advanced/basic).
8 Didactic outline
This course can rather be considered a workshop in the sense that the instructor uses a dialogue-based approach to teaching and that there are numerous exercises during the course. Instead of providing long talks to the students, the instructor feels obligated to engage students in back-and-forth conversations. Similarly, the presentation of a large number of Powerpoint slide is avoided. Instead, a thorough course literature is available (free online), so that students will have no barrier in diving deeply into the materials and ideas presented. However, during class it is more important to transmit the pivotal ideas; details need to be read and worked by the students individually after (and before) the course. As an alternative to presenting a lot of text on slides, in this course there will be a (electronic) whiteboard where concepts are developed dynamically and in pace of the teaching conversation thereby adjusting the “dose” of new thoughts to the actual pace of the instruction.
9 Schedule
9.1 Overview on topics covered
- Data Visualization using the grammar of graphics and ggplot2
- Data Wrangling based on the tidyverse in R
- Basic concepts of statistical modelling
- Primer on causal inference
- Introduction to regression analysis
9.2 Block 1: Explorative Data Analysis
9.2.1 Visualization
- Data visualization, see ModernDive Chap. 2, and get the R code here
- Exploring common types of statistical diagrams, the “5NG”
- Discussing when (not) to use diagrams see Anscombe’s Quartett, and when to use which one
- Building elegant graphics in R
9.2.2 Data Wrangling
- Data wrangling, see ModernDive Chap. 3, and get the R code here
- A taxonomy of typical data operations
- How to perform common data operations with R
- Summarizing data (aka computing descriptive statistics)
9.2.3 Exercises / Case study
- Exercises
- Exercises on life expectancy.
- Case study on the visualization of flight delays
- Advanced case study on one hit wonders
- Visualization covid cases
- Case study on nominal data: Survival on the Titanic
- Inspiration for own project: Visualize Covid-19 cases from this source.
9.3 Block 2: Statistical Modelling: Basic
9.3.1 Theory
Basics of modelling, see ModernDive Chap. 5.0, and get the R code here
- What is modelling?
- Basic terminology
- Prediction vs. explanation
Some thoughts on causal inference, see ModernDive Chap. 5.3.1
Regression with one numerical predictor, see ModernDive Chap. 5.1
Regression with one categorical predictor, see ModernDive Chap. 5.2
Assessing model fit (using (adjusted) \(R^2\)), see ModernDive Chap. 5.3.2
For some tips and tricks on typical issues, see ModernDive tips and tricks
9.3.2 Case study
- Exercises/Case studies:
- Prices of Boston houses, first part
- Modeling movie succes, first part
9.4 Block 3: Statistical Modelling: Multiple Regression and interaction
9.4.1 Theory
Slightly more advanced topics on linear regression such as multiple regression and interaction, see ModernDive Chap. 6, and get the R code here
One numerical and one categorical predictor, see ModernDive Chap. 6.1
Two numerical predictors, see ModernDive Chap. 6.2
Simpson’s paradox and more on causal inference, see ModernDive Chap. 6.3.3
9.4.2 Case study
- Exercises/Case studies:
- Prices of Boston houses, second part
- Modeling movie succes, second part
- Modeling flight delays
9.5 Block 4: Project coaching
This session is dedicated to work on real projects brought in by the students.
In addition, open questions regarding the presented concepts are being discussed.
10 Instructor
Sebastian Sauer works as a professor for Business Psychology at Ansbach university. Analyzing data to answer questions related to social phenomena is one of his major interests. The programming language “R” is one of his favorite tools. He sees himself as a learner, and is particularly interested in statistical methods. Open Science is a hot topic to him. He hopes to contribute to pressing social problems such as populism by bringing in his statistical and psychological know-how. He writes a blog which serves as a sketchpad for stuff in his mind (not immune to thought updates) at https://data-se.netlify.app/. Sebastian is the author of “Moderne Datenanalyse mit R” [@sauer_moderne_2019]. His publication list is available on Google Scholar.
11 Contact me
Feel free to contact me via email at sebastiansauer1@gmail.com
.
12 Assessment and grades
There is no assessment, there are no grades!
13 Talk to me
It’s my goal to make this an excellent course and a stimulating and enjoyable experience for all of us. So that I can find out if this is happening, I encourage feedback—be it positive or negative—on all aspects of the course at any time. For example, if something I’m doing is making it difficult for you to learn, then let me know before it’s too late; if you particularly enjoyed something we did in class, say so so that we can do it again.
14 Course materials
Most of the materials as presented below is made available through the course book ModernDive. Please check the relevant chapters of the book before the course to make sure you have all materials available.
15 More resources
15.1 Recommendations
RStudio Cheatsheets, particularly on data wrangling, and data vizualization
Book R for Data Science as a handy reference or a serious text book.
Post your open question on Stack Overflow.
Follow
#rstats
hashtag on Twitter.
For students willing to learn more and go deeper (than the concepts explored in the present course), this book on regression modelling, and this book on statistical learning are recommended. For German folks, check out my book on modern data analysis.
Suggested literature for deepening the analytic skills include Statistical Rethinking. For an introduction to graphical causal models, check out Julia Rohrer’s paper. For a more in-depth journey, consider reading this book. While I wholeheartedly recommend such books, we will not be able to discuss many of the ideas presented therein in class (in this course) due to time constraints.
15.2 R Packages
All R packages are accessible through the course book; please consult the relevant chapters. Please install all R packages used before the course. Here’s a tutorial on how to install R packages.
The most imporatant R packages for this course are:
- tidyverse
- gapminder
- moderndive
- nycflights13
15.3 Data
All data are accessible through the course book; please consult the relevant chapters.
15.4 Labs (case studies)
Practical data analysis skills can be practiced using these labs; in addition Chapter 11 provides two cases studies. Note that such content may be used as homework.
There are a lot of case studies scatter on the internet.
16 Where are the slides?
There are none. I feel that slides are not optimal for learning. In class, slides can be detrimental if they are too wordy because that distracts from that the dialogue with the instructor, and I hold this very dialogue as essential. Outside of class, slides are neither helpful. Instead, a good book is much more beneficial, because in a book, there’s enough room to patiently explain in sufficient details, an endeavor which is impossible for a slide deck.
To underline my messages to you, dear learners, I will use some sketches on a virtual whiteboard, some interactive apps, live coding, and some (pre-prepared) diagrams. That’s a bit similar to what happens at Khan Academy. You might have noticed that many courses at Coursera follow a similar approach.
I readily confess that this approach is novel to many learners in these days, learners who are accustomed to hundreds of Powerpoint slides. Please be open and I think you will appreciate this didactic style.
Desktop version, not the server↩︎