Introductory books for data analysis

One way to dig into some topic such as data analysis is just-doing, trial and error. Another way is reading blogs; a fruitful avenue in my experience. However, the classical way of reading some good book is all but outdated.

Here are some recommendations of books I found helpful as a starter (books in English and German).

R for Data Science

Grolemund, G., & Wickham, H. (2016). R for Data Science. O’Reilly Media, Incorporated. Retrieved from

Hadley Wickham did it again. This book is a great ressource for building up a solid base in data analysis using R. What is special is that, of course, the philosophy of “tidy data” is a pivotal path throughout the book. That comes as no surprise as Hadleyi is the major prophet of this way of thinking (a great way by the way). Second, the book is “modern” in the sense as newly developed, but thoughtfully devised and already quite popular approaches are put forward. Note that the book is not intended to go into all the depth of the R languages, neither programming, neither statistical learning. Most notably whereas the title indicates a salty dose of statistics, this association does not come true. Statistical modeling is a rather small topic in the book.

Introduction to Statistical Learning

James, G., Witten, D., Hastie, T., & Tibishirani, R. (2013). An Introduction to Statistical Learning (Vol. 112). New York City: Springer. DOI:

A classical introduction, aimed at not-so mathematically inclined (or trained) learners. Important topics on the principles of statistical learnings are covered as well as widely used algorithms. R serves as the computational environment for all models/ computations. Datastes, exercised, and case studies (as well as an R package) comes bundled with the book. A great start into a deep data dive. Great extra: The PDF of the book is freely available in the web.

R - Einführung durch angewandte Statistik

Hatzinger, R., Hornik, K., & Nagel, H. (2011). R: Einführung durch angewandte Statistik. Hallbergmoos: Pearson Studium. Retrieved from

This book is intended for (native) German speaking audiences, especially in college courses and the like. The book offers a broad perspective on “typical” statistics using R. The curriculum of the book very well reflects typical college curricula; a notion that may find widespread inclinement throughout teachers and professors. However, the book is not modern in the sense that more recent R approaches such as the “tidyverse” are discussed in the book. Modern critical discussion on the p-value, presentation of alternatives, reflection on open science, and more on statistical learning would have helped to render a good book great, in my opinion. However, to get a grip on standard statistics using R, this book is a valuable resource.