Load packages

library(tidyverse)

Intro

A Gamma distribution is useful for modeling positive, right skewed data such as waiting times; it is a continuous function.

In this post, we’ll illustrate some properties of the Gamma distribution by simulating a toy example.

Simulate data and define structural model

Let $X$ be a discrete variable following uniform distribution, and $x_{i} \in {1, 2, 3}$ .

set.seed(42)

n <- 1000
X <- sample(x = c(1,2,3), size = n, replace = TRUE)

hist(X)

Let $y_{i} = 0.1 + 1 x_{i}$ , by extension let $\hat{y_{i}} = 0.1 + 0.1 x_{i} + ϵ_{i}$ , where $ϵ_{i} \sim Γ (m = 0.1, s = 1 / m)$ , and $Γ$ refers to the Gamma distribution with location $m$ and shape $s$ .

e <- rgamma(n, shape = 0.1, scale = 1/0.1)

Compute $Y$ and $\hat{(} Y)$ :

Y = 0.1 + 1 * X

Y_hat = Y + e

hist(Y_hat)

plot(X, Y_hat)

Regression

glm1 <- glm(Y_hat ~ X, Gamma(link = "log"))
summary(glm1)
#> 
#> Call:
#> glm(formula = Y_hat ~ X, family = Gamma(link = "log"))
#> 
#> Deviance Residuals: 
#>     Min       1Q   Median       3Q      Max  
#> -0.6617  -0.3676  -0.2828  -0.2056   4.8995  
#> 
#> Coefficients:
#>             Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)  0.54403    0.10210   5.329 1.22e-07 ***
#> X            0.29473    0.04777   6.170 9.93e-10 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> (Dispersion parameter for Gamma family taken to be 1.459691)
#> 
#>     Null deviance: 497.06  on 999  degrees of freedom
#> Residual deviance: 442.11  on 998  degrees of freedom
#> AIC: 3915.2
#> 
#> Number of Fisher Scoring iterations: 5

Coefficients, delogged

exp(coef(glm1))
#> (Intercept)           X 
#>    1.722940    1.342758

The delogged (exponentiated) coefficients provide a factor by which the mean of the outcome variable will change per unit change of X, similar to a normal linear model. The main difference is that the the coefficient denotes a multiplicative change rather than an additive change.

Prediction time

We’d expect that, if $x_{i} = 1$ , that $y_{i} = 1.72 * 1.34 \approx 2.31$ .

Let’s check that.

predict(glm1, newdata = data.frame(X=1))
#>         1 
#> 0.8387581

That’s in log-notation. Let’s delog:

predict(glm1, newdata = data.frame(X=1)) %>% exp()
#>        1 
#> 2.313492

Works!

Simulating data for a Gamma regression