Setup
library(tidyverse)
data(mtcars)
Using a log-Y regression
Gelman et al., in “Regression and other stories” are stating that “when additivity and linearity are not reasonable assumptions” it may sense to “take the logarithms of outcomes that are all positive” (p. 189).
A log-y regression can be defined as follows, in the simplest case:
\[\text{log} \, y = b_0 + b_1X_1 + \ldots + \epsilon\] Exponentiating both sides yields
\[y = e^{b_0 + b_1X_1 + \ldots + \epsilon}\]
This is a multiplicative model as can be seen perhaps more lucidly in this notation:
\[y = e^{b0} \cdot e^{b1} \cdot \ldots \cdot e^{\epsilon}\]
In essence, we can make use uf log-y regression for modelling multiplicative assocations between x and y.
A multiplicative associations also means that it is not linear.
Let’s have a look at an example.
m1 <- lm(log(mpg) ~ hp, data = mtcars)
coef(m1)
## (Intercept) hp
## 3.460466874 -0.003428734
The coefficent of the predictor is reported as -0.0034287. Exponentiating this values yields approx.
b1_approx <- coef(m1)[2] %>% exp()
b1_approx
## hp
## 0.9965771
Thus, one more unit in \(X\) increases \(Y\) by exp(b1).
In this case, exp(1)
amounts to 0.9965771.
Of course, factors below one indicate a decrease in \(Y\).
In this example, we find a decrease of approx. 0.035% change in y associated with a one unit change in x.
Let’s have a look at the predictions:
pred_grid <-
tibble(
hp = 0:400,
pred = predict(m1, tibble(hp))
) %>%
mutate(pred_exp = exp(pred))
ggplot(mtcars) +
aes(x = hp, y = mpg) +
geom_point() +
geom_line(data = pred_grid,
aes(x = hp, y = pred_exp))
There’s a useful shortcut to interpreting log-y regression coefficients.
Compare this following (small) figures and their respective exponentiated values:
dat <-
tibble(
xs = seq(-.5, .5, by = .1),
xs_exp = exp(xs)
)
ggplot(dat) +
aes(x = xs, y = xs_exp) +
geom_line() +
geom_abline(slope = 1, intercept = 1)