Suppose you have a large number of columns of a dataframe, and you want to plot each column – say a histogram for each column.
This post shows some ways of achieving this.
Let’s take the mtcars dataset as an example.
data(mtcars)
We will use the tidyverse approach:
library(tidyverse)
Way 1
mtcars %>%
select_if(is_numeric) %>%
map2(., names(.), ~ {ggplot(data = data_frame(.x),
aes(x = .x)) +
geom_histogram() +
labs(x= .y)})
#> $mpg
#>
#> $cyl
#>
#> $disp
#>
#> $hp
#>
#> $drat
#>
#> $wt
#>
#> $qsec
#>
#> $vs
#>
#> $am
#>
#> $gear
#>
#> $carb
Some explanations:
- First, we take the dataset
mtcars
. - Then, we map a function (ie.,
ggplot()
) to each column ofmtcars
, but we also parse the names ofmtcars
. ggplot()
likes dataframes, butmap()
serves lists/vectors, so we have to enshrine each vector to a dataframe usingdata_frame()
.- The data comes from the first list (
mtcars
), that’s where.x
comes from (or refers to). - The names come from the second lsit (
names(mtcars)
), that’s where.y
points to.
Way 2
A maybe more simple is this:
mtcars %>%
gather(key = item, value = value) %>%
ggplot() +
aes(x = value) +
geom_density() +
facet_wrap(~ item, ncol = 2, scales = "free")
Test if column is normally distributed before doing anything else
Suppose we want to check whether a column is nicely normally distributed before plotting. That’s one way to checking that:
mtcars %>%
map(~ shapiro.test(.x)) %>%
map("p.value") %>%
keep(. > .05)
#> $mpg
#> [1] 0.1228814
#>
#> $drat
#> [1] 0.1100608
#>
#> $wt
#> [1] 0.09265499
#>
#> $qsec
#> [1] 0.5935176