Motivation
The (simple) linear regression is a standard tool in data analysis and statistics. Its properties are well-known but sometimes not known in details to the applied analyst; which is ok. However, if one wishes to understand deeper the internals of the system, the question may arise how to derive the coefficients of the linear regression. Here’s one way.
This approach focuses on simple calculus and derivatives; no matrix algebra, and only the simple case for one predictor.
There are many sources and tutorials similar to this around for example this or here.
Here’s the workhorse
Simple linear regression is defines such that
That’s the regression line.
The regression line is optimal in the sense that it minimizes the squared distances from the line; for example from this source:
library(ggplot2)
d <- mtcars
fit <- lm(mpg ~ hp, data = d)
d$predicted <- predict(fit) # Save the predicted values
d$residuals <- residuals(fit) # Save the residual values
ggplot(d, aes(x = hp, y = mpg)) +
geom_smooth(method = "lm", se = FALSE, color = "lightgrey") + # Plot regression slope
geom_segment(aes(xend = hp, yend = predicted), alpha = .2) + # alpha to fade lines
geom_point() +
geom_point(aes(y = predicted), shape = 1) +
theme_bw() # Add theme for cleaner look
In other words, the residuals minimize this cost function, :
Substituting the values:
To minimize a function, we can take the first derivative. For functions with more than one variable, we take the partial derivative of the respective parameter. Let’s begin with :
Arg, now what? Chain rule to the rescue!
- Differentiate outer function first.
- Multiply by inner function
- Differentiate inner function and multiply with what we already have.
Note that the inner function here simplifies to .
Set to zero to get the minimum.
Now pull out the -2 from the summation and divide both equations by -2:
Multiply out the summation:
Note that the sum of a constant is times the constant: .
Using the above term we get:
Now solve for :
And that’s simply the mean of and !
Finding
Get the partial derivative w.r.t. :
That gives us, set to zero:
Divide by -2:
Redistribute :
As we already now , let’s substitute that back in:
Split in two sums:
We would like to isolate , so let’s get this guy more isolated:
To finally isolate, substract the first sum and divide by the second:
Yeah! Here we go, there we have .