Say, you have to choose between two doctors (Anna and Berta). To decide which one is better, you check their success rates. Suppose that they deal with two conditions (Coolities and Dummities). So let’s compare their success rate for each of the two conditions (and the total success rate):

This is the proportion of healing (success) of the first doctor, Dr. Anna for each of the two conditions:

Coolities: 7 out of 8 patients are healed from Coolities
Dummieties: 1 out of 2 patients are healed from Dummities

This is the proportion of healing (success) of the first doctor, Dr. Berta for each of the two conditions:

Coolities: 2 out of 2 patients are healed from Coolities
Dummieties: 5 out of 8 patients are healed from Dummities

Now, which doc should you choose? Anna or Berta?

For Coolities: Berta is better than Anna. For Dummities: Berta ist better than Anna.

In each of the conditions, Berta is superior to Anna. So you should choose Anna, and give her a visit as soon as possible, right?

See here the Data in R code:

library(tidyverse)
df <- tribble(
  ~name,   ~coolities, ~dummities,
   "Anna", 7/8,         1/2,
  "Berta", 2/2,         5/5
)
df

## # A tibble: 2 × 3
##   name  coolities dummities
##   <chr>     <dbl>     <dbl>
## 1 Anna      0.875       0.5
## 2 Berta     1           1

df <- tribble(
  ~name,   ~condition,  ~healed, ~lost, ~success_rate,
   "Anna", "coolities",  7L, 1L, 7/8,
  "Anna", "dummities",  1L, 1L, 1/2,
  "Berta", "coolities",  2L, 0L, 2/2,
  "Berta", "dummities", 5L, 3L, 5/8
)
df

## # A tibble: 4 × 5
##   name  condition healed  lost success_rate
##   <chr> <chr>      <int> <int>        <dbl>
## 1 Anna  coolities      7     1        0.875
## 2 Anna  dummities      1     1        0.5  
## 3 Berta coolities      2     0        1    
## 4 Berta dummities      5     3        0.625

Let’s plot the success rates of both doctors:

df %>% 
  mutate(treated = healed+lost) %>% 
  ggplot(aes(x = name, y = success_rate)) +
  geom_col(aes(fill = name)) +
  facet_wrap(~condition)

As we see, Berta’s success rate is in both cases higher than Annas. So Berta is better!?

To better understand what’s going on, let’s plot absolute values. First, reformat from long to wide:

df %>% 
  gather(key = outcome, value = patient_count, -c(name, success_rate, condition)) -> df_long

df_long

## # A tibble: 8 × 5
##   name  condition success_rate outcome patient_count
##   <chr> <chr>            <dbl> <chr>           <int>
## 1 Anna  coolities        0.875 healed              7
## 2 Anna  dummities        0.5   healed              1
## 3 Berta coolities        1     healed              2
## 4 Berta dummities        0.625 healed              5
## 5 Anna  coolities        0.875 lost                1
## 6 Anna  dummities        0.5   lost                1
## 7 Berta coolities        1     lost                0
## 8 Berta dummities        0.625 lost                3

Then plot:

df_long %>% 
  ggplot(aes(x = name, y = patient_count, fill = outcome)) +
  geom_bar(stat = "identity", position = "stack") +
  facet_wrap(~condition)

Beautify somewhat:

df_long$outcome_f <- factor(x = df_long$outcome,
                                  levels = c("lost",
                                             "healed"
                                             ))
p_docs1 <- df_long %>% 
  ggplot(aes(x = name, y = patient_count, fill = outcome_f)) +
  geom_bar(stat = "identity", position = "stack") +
  facet_wrap(~condition) +
  scale_y_continuous(breaks = 0:10, limits = c(0, 10)) -> p_docs1
p_docs1

Clearly - Berta is better in each of the two conditions, and Anna is worse. But note: The counts, ie. the number of patients differ! This will explain the “paradox”. Anna has lots of Coolities treatments (and Berta not), whereas the opposite is true for Dummities: Anna few Dummities treatments, but Berta many of them.

Ok. Let’s check the overall, total success rate:

df %>% 
  group_by(name) %>% 
  summarise(healed_total = sum(healed),
            lost_total = sum(lost),
            success_rate_total = healed_total / 
              (healed_total + lost_total)) -> df_grouped
df_grouped

## # A tibble: 2 × 4
##   name  healed_total lost_total success_rate_total
##   <chr>        <int>      <int>              <dbl>
## 1 Anna             8          2                0.8
## 2 Berta            7          3                0.7

Wait! Annas total success rate is better than Bertas! Can that be?

Let’s plot again:

df_grouped %>% 
  ggplot(aes(x = name, y = success_rate_total)) +
  geom_col(aes(fill = name))

Let’s plot slightly different. First, reformat the dataframe to long format:

df_grouped %>%
  gather(key = outcome, value = patient_count, -c(name, success_rate_total)) -> df_grouped_long

df_grouped_long

## # A tibble: 4 × 4
##   name  success_rate_total outcome      patient_count
##   <chr>              <dbl> <chr>                <int>
## 1 Anna                 0.8 healed_total             8
## 2 Berta                0.7 healed_total             7
## 3 Anna                 0.8 lost_total               2
## 4 Berta                0.7 lost_total               3

Then plot it:

df_grouped_long %>% 
  ggplot(aes(x = name, y = patient_count, fill = outcome)) +
  geom_col()

Beautify somewhat:

df_grouped_long$outcome <- factor(x = df_grouped_long$outcome,
                                  levels = c("lost_total",
                                             "healed_total"
                                             ))

df_grouped_long %>% 
  ggplot(aes(x = name, y = patient_count, fill = outcome)) +
  geom_bar(stat = "identity") -> p_docs2
p_docs2

Debrief

Why is that? It needs some time to digest that there’s no paradox here going on. The point is that Anna many Coolities cases help her to boost her overall rate. In contrast, Berta’s focus on Dummities pulls her down. Imagine that Coolities is simple to treat, whereas Dummities is much harder to treat. So Berta is focusing on the more difficult ailment; in consequence her overall success rate is comparatively low.

Compare that to a weighted average: The success rate is weighted by the case number for each doctor and each condition.

Maybe it helps to put the two cases (rates per treatment vs. rates in total) in perspective. Here’s an visual comparison:

library(gridExtra) 
grid.arrange(p_docs1, p_docs2, nrow = 1)

Intuition to Simpson's paradox

Debrief