Simulating data for a Gamma regression

Load packages

library(tidyverse)

Intro

A Gamma distribution is useful for modeling positive, right skewed data such as waiting times; it is a continuous function.

In this post, we’ll illustrate some properties of the Gamma distribution by simulating a toy example.

Simulate data and define structural model

Let X be a discrete variable following uniform distribution, and xi{1,2,3}.

set.seed(42)

n <- 1000
X <- sample(x = c(1,2,3), size = n, replace = TRUE)

hist(X)

Let yi=0.1+1xi, by extension let ^yi=0.1+0.1xi+ϵi, where ϵiΓ(m=0.1,s=1/m), and Γ refers to the Gamma distribution with location m and shape s.

e <- rgamma(n, shape = 0.1, scale = 1/0.1)

Compute Y and ˆ(Y):

Y = 0.1 + 1 * X
Y_hat = Y + e
hist(Y_hat)

plot(X, Y_hat)

Regression

glm1 <- glm(Y_hat ~ X, Gamma(link = "log"))
summary(glm1)
#> 
#> Call:
#> glm(formula = Y_hat ~ X, family = Gamma(link = "log"))
#> 
#> Deviance Residuals: 
#>     Min       1Q   Median       3Q      Max  
#> -0.6617  -0.3676  -0.2828  -0.2056   4.8995  
#> 
#> Coefficients:
#>             Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)  0.54403    0.10210   5.329 1.22e-07 ***
#> X            0.29473    0.04777   6.170 9.93e-10 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> (Dispersion parameter for Gamma family taken to be 1.459691)
#> 
#>     Null deviance: 497.06  on 999  degrees of freedom
#> Residual deviance: 442.11  on 998  degrees of freedom
#> AIC: 3915.2
#> 
#> Number of Fisher Scoring iterations: 5

Coefficients, delogged

exp(coef(glm1))
#> (Intercept)           X 
#>    1.722940    1.342758

The delogged (exponentiated) coefficients provide a factor by which the mean of the outcome variable will change per unit change of X, similar to a normal linear model. The main difference is that the the coefficient denotes a multiplicative change rather than an additive change.

Prediction time

We’d expect that, if xi=1, that yi=1.721.342.31.

Let’s check that.

predict(glm1, newdata = data.frame(X=1))
#>         1 
#> 0.8387581

That’s in log-notation. Let’s delog:

predict(glm1, newdata = data.frame(X=1)) %>% exp()
#>        1 
#> 2.313492

Works!