A common undertaking in applied research settings such as in some areas of psychology is to convert a raw score into some type of standardized score such as z-scores.
This post shows a way how to accomplish that.
Load packages
library(tidyverse)
Load some psychometric data
data("extra", package = "pradadata")
The data can be downloaded here.
The dataset shows some data on extraversion (the personality trait) items along with some correlates of extraversion.
Let’s have a look.
glimpse(extra)
#> Observations: 826
#> Variables: 34
#> $ timestamp <chr> "11.03.2015 19:17:48", "11.03.2015 19:18:05",…
#> $ code <chr> "HSC", "ERB", "ADP", "KHB", "PTG", "ABL", "be…
#> $ i01 <int> 3, 2, 3, 3, 4, 3, 4, 3, 4, 4, 3, 3, 4, 4, 3, …
#> $ i02r <int> 3, 2, 4, 3, 3, 2, 4, 3, 4, 4, 3, 4, 3, 3, 3, …
#> $ i03 <int> 3, 1, 1, 2, 1, 1, 1, 2, 1, 2, 1, 1, 1, 4, 1, …
#> $ i04 <int> 3, 2, 4, 4, 4, 4, 3, 3, 4, 4, 3, 3, 2, 4, 3, …
#> $ i05 <int> 4, 3, 4, 3, 4, 2, 3, 2, 3, 3, 3, 2, 3, 3, 3, …
#> $ i06r <int> 4, 2, 1, 3, 3, 3, 3, 2, 4, 3, 3, 3, 3, 3, 3, …
#> $ i07 <int> 3, 2, 3, 3, 4, 4, 2, 3, 3, 3, 2, 4, 2, 3, 3, …
#> $ i08 <int> 2, 3, 2, 3, 2, 3, 3, 2, 3, 3, 3, 2, 3, 3, 4, …
#> $ i09 <int> 3, 3, 3, 3, 3, 3, 3, 4, 4, 3, 4, 2, 4, 4, 4, …
#> $ i10 <int> 1, 1, 1, 2, 4, 3, 2, 1, 2, 3, 1, 3, 2, 3, 2, …
#> $ n_facebook_friends <dbl> 250, 106, 215, 200, 100, 376, 180, 432, 200, …
#> $ n_hangover <dbl> 1, 0, 0, 15, 0, 1, 1, 2, 5, 0, 1, 2, 20, 2, 2…
#> $ age <int> 24, 35, 25, 39, 29, 33, 24, 28, 29, 38, 25, 5…
#> $ sex <chr> "Frau", "Frau", "Frau", "Frau", "Frau", "Mann…
#> $ extra_single_item <int> 4, 3, 4, 3, 4, 4, 3, 3, 4, 4, 4, 4, 4, 4, 4, …
#> $ time_conversation <dbl> 10, 15, 15, 5, 5, 20, 2, 15, 10, 10, 1, 5, 5,…
#> $ presentation <chr> "nein", "nein", "nein", "nein", "nein", "ja",…
#> $ n_party <dbl> 20, 5, 3, 25, 4, 4, 3, 6, 12, 5, 10, 5, 10, 1…
#> $ clients <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
#> $ extra_vignette <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
#> $ i21 <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
#> $ extra_vignette2 <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
#> $ major <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
#> $ smoker <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
#> $ sleep_week <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
#> $ sleep_wend <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
#> $ clients_freq <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
#> $ extra_mean <dbl> 2.9, 2.1, 2.6, 2.9, 3.2, 2.8, 2.8, 2.5, 3.2, …
#> $ extra_md <dbl> 3.0, 2.0, 3.0, 3.0, 3.5, 3.0, 3.0, 2.5, 3.5, …
#> $ extra_aad <dbl> 0.56, 0.54, 1.08, 0.36, 0.80, 0.68, 0.68, 0.7…
#> $ extra_mode <dbl> 3, 2, 1, 3, 4, 3, 3, 2, 4, 3, 3, 3, 3, 3, 3, …
#> $ extra_iqr <dbl> 0.00, 0.75, 2.50, 0.00, 1.00, 0.75, 0.75, 1.0…
Ah, extra_mean
is our man. That’s the raw score.
Convert to z-score with some given mean and sd
Say, the manual or some authority holds the view that the mean equals 3 and the sd equals 1. OK, let’s get a z-value using these parameters.
Here’s the formula for z-values:
\[ z = \frac{x - \mu}{sd}\]
Where x is the vector of scores and \(\mu\) is the mean.
extra <-
extra %>%
mutate(extra_z = (extra_mean - 3)/1)
Convert to Stanine
The z-scores can be convertes using this scheme to Stanine:
- Stanine 1: -1.75
- stanine 2: -1.75 to -1.25
- stanine 3: -1.25 to -0.75
- stanine 4: -0.75 to -0.25
- stanine 5: -0.25 to 0.25
- stanine 6: 0.25 to 0.7
- stanine 7: 0.75 to 1.25
- stanine 8: 1.25 to 1.5
- stanine 9: above 1.75
Now let’s compare the z-score to the ranges of stanine scores.
extra <-
extra %>%
drop_na(extra_mean) %>%
mutate(extra_stanine = case_when(
extra_z < -1.75 ~ 1,
extra_z < -1.25 ~ 2,
extra_z < -0.75 ~ 3,
extra_z < -0.25 ~ 4,
extra_z < 0.25 ~ 5,
extra_z < 0.75 ~ 6,
extra_z < 1.25 ~ 7,
extra_z < 1.5 ~ 8,
TRUE ~ 9
))
Convert to percentiles
Note that I here use the quantile of the empirical cumulative distribution of the data given.
That’s quite straightforward:
extra <-
extra %>%
mutate(extra_p = percent_rank(extra_mean),
extra_p_theoretical = pnorm(extra_z))
Use extra_z
to get the theoretically derived percentiles.
Check the results
extra %>%
select(extra_mean, extra_z, extra_stanine, extra_p,
extra_p_theoretical) %>%
head()
#> # A tibble: 6 x 5
#> extra_mean extra_z extra_stanine extra_p extra_p_theoretical
#> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 2.9 -0.1 5 0.431 0.460
#> 2 2.1 -0.900 3 0.0451 0.184
#> 3 2.6 -0.400 4 0.217 0.345
#> 4 2.9 -0.1 5 0.431 0.460
#> 5 3.2 0.2 5 0.691 0.579
#> 6 2.8 -0.2 5 0.359 0.421