How to convert raw scores to different types of standardized scores

A common undertaking in applied research settings such as in some areas of psychology is to convert a raw score into some type of standardized score such as z-scores.

This post shows a way how to accomplish that.

Load packages

library(tidyverse)

Load some psychometric data

data("extra", package = "pradadata")

The data can be downloaded here.

The dataset shows some data on extraversion (the personality trait) items along with some correlates of extraversion.

Let’s have a look.

glimpse(extra)
#> Observations: 826
#> Variables: 34
#> $ timestamp          <chr> "11.03.2015 19:17:48", "11.03.2015 19:18:05",…
#> $ code               <chr> "HSC", "ERB", "ADP", "KHB", "PTG", "ABL", "be…
#> $ i01                <int> 3, 2, 3, 3, 4, 3, 4, 3, 4, 4, 3, 3, 4, 4, 3, …
#> $ i02r               <int> 3, 2, 4, 3, 3, 2, 4, 3, 4, 4, 3, 4, 3, 3, 3, …
#> $ i03                <int> 3, 1, 1, 2, 1, 1, 1, 2, 1, 2, 1, 1, 1, 4, 1, …
#> $ i04                <int> 3, 2, 4, 4, 4, 4, 3, 3, 4, 4, 3, 3, 2, 4, 3, …
#> $ i05                <int> 4, 3, 4, 3, 4, 2, 3, 2, 3, 3, 3, 2, 3, 3, 3, …
#> $ i06r               <int> 4, 2, 1, 3, 3, 3, 3, 2, 4, 3, 3, 3, 3, 3, 3, …
#> $ i07                <int> 3, 2, 3, 3, 4, 4, 2, 3, 3, 3, 2, 4, 2, 3, 3, …
#> $ i08                <int> 2, 3, 2, 3, 2, 3, 3, 2, 3, 3, 3, 2, 3, 3, 4, …
#> $ i09                <int> 3, 3, 3, 3, 3, 3, 3, 4, 4, 3, 4, 2, 4, 4, 4, …
#> $ i10                <int> 1, 1, 1, 2, 4, 3, 2, 1, 2, 3, 1, 3, 2, 3, 2, …
#> $ n_facebook_friends <dbl> 250, 106, 215, 200, 100, 376, 180, 432, 200, …
#> $ n_hangover         <dbl> 1, 0, 0, 15, 0, 1, 1, 2, 5, 0, 1, 2, 20, 2, 2…
#> $ age                <int> 24, 35, 25, 39, 29, 33, 24, 28, 29, 38, 25, 5…
#> $ sex                <chr> "Frau", "Frau", "Frau", "Frau", "Frau", "Mann…
#> $ extra_single_item  <int> 4, 3, 4, 3, 4, 4, 3, 3, 4, 4, 4, 4, 4, 4, 4, …
#> $ time_conversation  <dbl> 10, 15, 15, 5, 5, 20, 2, 15, 10, 10, 1, 5, 5,…
#> $ presentation       <chr> "nein", "nein", "nein", "nein", "nein", "ja",…
#> $ n_party            <dbl> 20, 5, 3, 25, 4, 4, 3, 6, 12, 5, 10, 5, 10, 1…
#> $ clients            <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
#> $ extra_vignette     <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
#> $ i21                <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
#> $ extra_vignette2    <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
#> $ major              <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
#> $ smoker             <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
#> $ sleep_week         <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
#> $ sleep_wend         <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
#> $ clients_freq       <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
#> $ extra_mean         <dbl> 2.9, 2.1, 2.6, 2.9, 3.2, 2.8, 2.8, 2.5, 3.2, …
#> $ extra_md           <dbl> 3.0, 2.0, 3.0, 3.0, 3.5, 3.0, 3.0, 2.5, 3.5, …
#> $ extra_aad          <dbl> 0.56, 0.54, 1.08, 0.36, 0.80, 0.68, 0.68, 0.7…
#> $ extra_mode         <dbl> 3, 2, 1, 3, 4, 3, 3, 2, 4, 3, 3, 3, 3, 3, 3, …
#> $ extra_iqr          <dbl> 0.00, 0.75, 2.50, 0.00, 1.00, 0.75, 0.75, 1.0…

Ah, extra_mean is our man. That’s the raw score.

Convert to z-score with some given mean and sd

Say, the manual or some authority holds the view that the mean equals 3 and the sd equals 1. OK, let’s get a z-value using these parameters.

Here’s the formula for z-values:

\[ z = \frac{x - \mu}{sd}\]

Where x is the vector of scores and \(\mu\) is the mean.

extra <-
  extra %>% 
  mutate(extra_z = (extra_mean - 3)/1)

Convert to Stanine

The z-scores can be convertes using this scheme to Stanine:

  • Stanine 1: -1.75
  • stanine 2: -1.75 to -1.25
  • stanine 3: -1.25 to -0.75
  • stanine 4: -0.75 to -0.25
  • stanine 5: -0.25 to 0.25
  • stanine 6: 0.25 to 0.7
  • stanine 7: 0.75 to 1.25
  • stanine 8: 1.25 to 1.5
  • stanine 9: above 1.75

Now let’s compare the z-score to the ranges of stanine scores.

extra <-
  extra %>% 
drop_na(extra_mean) %>%
  mutate(extra_stanine = case_when(
    extra_z < -1.75 ~ 1,
    extra_z < -1.25 ~ 2,
    extra_z < -0.75 ~ 3,
    extra_z < -0.25 ~ 4,
    extra_z < 0.25 ~ 5,
    extra_z < 0.75 ~ 6,
    extra_z < 1.25 ~ 7,
    extra_z < 1.5 ~ 8,
    TRUE ~ 9
  ))

Convert to percentiles

Note that I here use the quantile of the empirical cumulative distribution of the data given.

That’s quite straightforward:

extra <- 
  extra %>% 
    mutate(extra_p = percent_rank(extra_mean),
           extra_p_theoretical = pnorm(extra_z))

Use extra_z to get the theoretically derived percentiles.

Check the results

extra %>% 
  select(extra_mean, extra_z, extra_stanine, extra_p, 
         extra_p_theoretical) %>% 
  head()
#> # A tibble: 6 x 5
#>   extra_mean extra_z extra_stanine extra_p extra_p_theoretical
#>        <dbl>   <dbl>         <dbl>   <dbl>               <dbl>
#> 1        2.9  -0.1               5  0.431                0.460
#> 2        2.1  -0.900             3  0.0451               0.184
#> 3        2.6  -0.400             4  0.217                0.345
#> 4        2.9  -0.1               5  0.431                0.460
#> 5        3.2   0.2               5  0.691                0.579
#> 6        2.8  -0.2               5  0.359                0.421