Load packages
library(tidyverse)
library(mosaic)
Data
data("tips", package = "reshape2")
Non-parametric tests and simulation based inference
Simulation-based inference (SBI) is an old tool that has seen a surge in research interest in recent years probably due to the large amount of computational powers at the hands of researchers.
SBI is less prone to violations of assumptions, particularly with distributional assumptions. This is because inference is not based on the idea that some variable follows a – for example – normal distribution.
As a consequence there is much less need to seek shelter at nonparametric tests in case distributional assumptions are violated.
In this post, I’ll show how the Kurskal-Wallis test can be applied using SBI and how to translate it to a linear model.
Research question
Is there a difference in mean tip across different days?
Kruskall test the classic way
k_test <- kruskal.test(tip ~ day, data = tips)
k_test
#>
#> Kruskal-Wallis rank sum test
#>
#> data: tip by day
#> Kruskal-Wallis chi-squared = 8.5656, df = 3, p-value = 0.03566
Let’s have a closer look to the resulting object:
str(k_test)
#> List of 5
#> $ statistic: Named num 8.57
#> ..- attr(*, "names")= chr "Kruskal-Wallis chi-squared"
#> $ parameter: Named int 3
#> ..- attr(*, "names")= chr "df"
#> $ p.value : num 0.0357
#> $ method : chr "Kruskal-Wallis rank sum test"
#> $ data.name: chr "tip by day"
#> - attr(*, "class")= chr "htest"
We can extract the statistic in this way:
k_test$statistic
#> Kruskal-Wallis chi-squared
#> 8.565588
pluck(k_test, "statistic")
#> Kruskal-Wallis chi-squared
#> 8.565588
Or, alternatively:
kruskal.test(tip ~ day, data = tips)$statistic
#> Kruskal-Wallis chi-squared
#> 8.565588
The statistic is defined as shown here:
\[ H = (N-1)\frac{\sum_{i=1}^g n_i(\bar{r}_{i\cdot} - \bar{r})^2}{\sum_{i=1}^g\sum_{j=1}^{n_i}(r_{ij} - \bar{r})^2} \]
And this statistic is asymptotically \(\chi^2\) distributed.
Kruskall test via SBI
null1 <- mosaic::do(1000) * kruskal.test(tip ~ shuffle(day),
data = tips)
Let’s check the results:
null1 %>%
head() %>%
knitr::kable()
Kruskal.Wallis.chi.squared | df | p.value | method | alternative | data | .row | .index |
---|---|---|---|---|---|---|---|
3.1688452 | 3 | 0.3663180 | Kruskal-Wallis rank sum test | NA | tip by shuffle(day) | 1 | 1 |
0.0094803 | 3 | 0.9997552 | Kruskal-Wallis rank sum test | NA | tip by shuffle(day) | 1 | 2 |
11.7318235 | 3 | 0.0083606 | Kruskal-Wallis rank sum test | NA | tip by shuffle(day) | 1 | 3 |
4.3075100 | 3 | 0.2301161 | Kruskal-Wallis rank sum test | NA | tip by shuffle(day) | 1 | 4 |
1.5435937 | 3 | 0.6722461 | Kruskal-Wallis rank sum test | NA | tip by shuffle(day) | 1 | 5 |
1.4203264 | 3 | 0.7007771 | Kruskal-Wallis rank sum test | NA | tip by shuffle(day) | 1 | 6 |
Plot it
Let’s add the empirical value as a vertical line.
gf_histogram( ~ Kruskal.Wallis.chi.squared, data = null1) %>%
gf_vline(xintercept = ~ k_test$statistic)
Let’s color the top-5 percent red, to highlight the critical value.
null1 <- null1 %>%
mutate(is_top5percent = percent_rank(Kruskal.Wallis.chi.squared) > .95)
gf_histogram( ~ Kruskal.Wallis.chi.squared,
data = null1,
fill = ~ is_top5percent) %>%
gf_vline(xintercept = ~ k_test$statistic)
Nice, but …
As outlined above, there may be no need to bypass the test of the mean. Using SBI one does not have to assume normal distribution of the variable tested if one would like to compute p-values.
Linear model for comparing means
lm1 <- lm(tip ~ day, data = tips)
lm1_sum <- summary(lm1)
lm1_sum
#>
#> Call:
#> lm(formula = tip ~ day, data = tips)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -2.2451 -0.9931 -0.2347 0.5382 7.0069
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 2.73474 0.31612 8.651 7.46e-16 ***
#> daySat 0.25837 0.34893 0.740 0.460
#> daySun 0.52039 0.35343 1.472 0.142
#> dayThur 0.03671 0.36132 0.102 0.919
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 1.378 on 240 degrees of freedom
#> Multiple R-squared: 0.02048, Adjusted R-squared: 0.008232
#> F-statistic: 1.672 on 3 and 240 DF, p-value: 0.1736
As can be seen, the \(F\) value is not significant thereby questioning the difference in means hypothesis. Put differently: The models sees no strong evidence for rejecting the null.
The F value, along with its 3 df values, can be plucked out like this:
emp_F_value <- pluck(lm1_sum, "fstatistic")
emp_F_value
#> value numdf dendf
#> 1.672355 3.000000 240.000000
Note that this is a vector of three 3 elements.
Linear model using SBI
Now the trick.
boot1 <- do(1000) * lm(tip ~ day, data = resample(tips))
confint(boot1)
#> name lower upper level method estimate
#> 1 Intercept 2.270318182 3.2140346 0.95 percentile 2.73473684
#> 2 daySat -0.304165972 0.8272075 0.95 percentile 0.25836661
#> 3 daySun 0.030470180 1.0744534 0.95 percentile 0.52039474
#> 4 dayThur -0.480811080 0.6252964 0.95 percentile 0.03671477
#> 5 sigma 1.157816369 1.5729610 0.95 percentile 1.37793112
#> 6 r.squared 0.005899732 0.0758314 0.95 percentile 0.02047639
#> 7 F 0.474779633 6.5642919 0.95 percentile 1.67235520
Pluck the r-squared from the lm object
Here’s a way to pluck the R squared value from the lm object:
lm(tip ~ day, data = tips) %>%
summary() %>%
pluck("r.squared")
#> [1] 0.02047639
Plot the results
gf_histogram(~ r.squared, data = boot1) %>%
gf_vline( xintercept = ~ lm(tip ~ day, data = tips) %>%
summary() %>%
pluck("r.squared"))
Interestingly, the R squeared distribution is not normally distributed. In sum, the R squared are really small.
Test the null using lm and SBI
null2 <- do(1000) * lm(tip ~ shuffle(day), data = tips)
Let’s check the \(F\) statistic. The critical value of the \(F\) statistic is, given 3 and 240 degrees of freedom (as shown by the lm output):
crit_f <- qf(p = 0.95,
df1 = 3,
df2 = 240)
crit_f
#> [1] 2.642213
Plot it:
null2 <- null2 %>%
mutate(top_5percent = percent_rank(F) > 0.95)
gf_histogram(~ F, data = null2,
fill =~top_5percent) %>%
gf_vline(xintercept = ~emp_F_value[1]) %>%
gf_label(label = ~ paste0("F: ",
emp_F_value[1] %>% round(2)),
x = ~ emp_F_value[1],
y = 0,
show.legend = FALSE)
As can be seen, the F value is not significant.
Some summary statistics to this distribution:
favstats(~ F, data = null2)
#> min Q1 median Q3 max mean sd n
#> 0.006571185 0.3887514 0.7581193 1.293386 5.553417 0.9314424 0.7433274 1000
#> missing
#> 0
Caution
Note that the ASA advises that the p-value should not be the (sole) basis of decision in any data analysis.
Bayesian statistics is a neat alternative.
ANOVA
As can be seen, the Kruskal Wallis and the F tests yield different results. This is probably due to the fact that we tested median differences (Kruskall) and later on mean differences (F test).
As a check, here’s a standard ANOVA:
aov1 <- aov(tip ~ day, data = tips)
summary(aov1)
#> Df Sum Sq Mean Sq F value Pr(>F)
#> day 3 9.5 3.175 1.672 0.174
#> Residuals 240 455.7 1.899
Replicating the F value decision from the SBI analysis above.
Plotting medians vs means
As a further check, let’s plot median and mean differences for the sake of comparison.
gf_point(tip ~ day,
data = tips,
stat = "summary",
fun = mean,
shape = 3,
size = 5,
color = "blue") +
stat_summary(geom = "point",
fun = median,
color = "red",
size = 5,
shape = 5) +
labs(caption = "Crosses: Mean, diamonds: Median")
As ca be seen, the median values are more spread out, hence it appears sensible that the test would get significant whereas the more clustered mean values do not show a significant difference in our test.