# Load packages

`library(tidyverse)`

# Basic testing for equality

Testing for equality in a kind of very basic function in computer (and data) science. There is a straightforward function in R to test for equality:

```
identical(1, 1)
#> [1] TRUE
identical("A", "A")
#> [1] TRUE
identical(1, 2)
#> [1] FALSE
identical(1, NA)
#> [1] FALSE
```

However this get more complicated if we want to compare more than two elements. One way to achieve this is to compute the number of the different items. If there’s only one different item, then all ~~maybe~~ are the same. But, luckily we can at least say, if there are different numbers of different elements, the vectors are not the same.

```
x <- c(1, 1, 1)
y <- c(1, 1, 99)
z <- c(1, 1, 1)
length(unique(x))
#> [1] 1
length(unique(y))
#> [1] 2
```

This approach generalizes to comparing two vector to see if they are identical.

```
length(unique(x)) == length(unique(y))
#> [1] FALSE
```

The vectors `x`

and `y`

are not identical.

# Be aware

```
x2 <- c(99, 99, 99)
length(unique(x)) == length(unique(x2))
#> [1] TRUE
```

Ths result delivers `TRUE`

but the vectors are different.

```
identical(x, x2) # different vectors
#> [1] FALSE
identical(x, z) # identical vectors
#> [1] TRUE
```

# Testing columnwise in a data frame

Let’s take this method to a data frame.

```
d <- tribble(
~ colA, ~colB, ~colC,
1, 1, 1,
1, 1, 99,
1, NA, 1
)
```

First, we compute the number of different values per column:

```
d %>% summarise_all(list( ~ length(unique(.))))
#> # A tibble: 1 x 3
#> colA colB colC
#> <int> <int> <int>
#> 1 1 2 2
```

Then we can check whether all are identical:

```
d2 <- d %>%
summarise_all(list( ~ length(unique(.))))
```

Often, when working with rowwise operations, it is helpful to transposte the data frame, as columnwise operations are easier. `gather`

is a way to transpose a data frame.

```
d2 %>%
gather() %>%
summarise(length(unique(value)))
#> # A tibble: 1 x 1
#> `length(unique(value))`
#> <int>
#> 1 2
```

The length of different values is greater than one (ie, 2), which tells us that not all values are identical.

# Testing rowwise in a data frame

Let’s say we want to know whether row contains only identical elements, row 2 only identical elements, and so on. Put shortly, we test for equality rowwise in this data frame.

```
d3 <- d %>%
mutate(concatenated = pmap(., c)) %>%
mutate(length_unique = map_int(concatenated, ~ length(unique(.))))
d3
#> # A tibble: 3 x 5
#> colA colB colC concatenated length_unique
#> <dbl> <dbl> <dbl> <list> <int>
#> 1 1 1 1 <dbl [3]> 1
#> 2 1 1 99 <dbl [3]> 2
#> 3 1 NA 1 <dbl [3]> 2
```

Let’s deconstruct that to get a grip on it. The first `mutate`

call simply constructs rowwise vectors of all columns. That is, for the first line, `1, 1, 1`

, and `1, 1, 99`

for the second line, and so on.

To access the list column `concatenated`

, use list indexing:

```
d3[["concatenated"]]
#> [[1]]
#> colA colB colC
#> 1 1 1
#>
#> [[2]]
#> colA colB colC
#> 1 1 99
#>
#> [[3]]
#> colA colB colC
#> 1 NA 1
```

Each line is a simple (named) vector:

```
d3[["concatenated"]] %>% str()
#> List of 3
#> $ : Named num [1:3] 1 1 1
#> ..- attr(*, "names")= chr [1:3] "colA" "colB" "colC"
#> $ : Named num [1:3] 1 1 99
#> ..- attr(*, "names")= chr [1:3] "colA" "colB" "colC"
#> $ : Named num [1:3] 1 NA 1
#> ..- attr(*, "names")= chr [1:3] "colA" "colB" "colC"
```

# Ignoring NAs rowwise

There is a function called `discard`

which discards elements of a list/vector if matching a condition such as `is.na`

:

```
d$colB %>% discard(is.na)
#> [1] 1 1
```

Let’s apply that on our list column `concatenated`

:

```
d4 <- d3 %>%
mutate(c_nona = map(concatenated, ~ discard(., is.na)))
d4
#> # A tibble: 3 x 6
#> colA colB colC concatenated length_unique c_nona
#> <dbl> <dbl> <dbl> <list> <int> <list>
#> 1 1 1 1 <dbl [3]> 1 <dbl [3]>
#> 2 1 1 99 <dbl [3]> 2 <dbl [3]>
#> 3 1 NA 1 <dbl [3]> 2 <dbl [2]>
```

In the third line of `d`

there was one missing value, so the length of the vector in line 3 should be shorter:

```
d4$c_nona[[3]]
#> colA colC
#> 1 1
```

```
d5 <- d4 %>%
mutate(lu2 = map_int(c_nona, ~ length(unique(.))))
d5
#> # A tibble: 3 x 7
#> colA colB colC concatenated length_unique c_nona lu2
#> <dbl> <dbl> <dbl> <list> <int> <list> <int>
#> 1 1 1 1 <dbl [3]> 1 <dbl [3]> 1
#> 2 1 1 99 <dbl [3]> 2 <dbl [3]> 2
#> 3 1 NA 1 <dbl [3]> 2 <dbl [2]> 1
```

Note that the normal way of working with `mutate`

in dplyr does not work with list columns, as they are no normal columns.

```
d4 %>%
mutate(lu2 = length(unique(c_nona)))
#> # A tibble: 3 x 7
#> colA colB colC concatenated length_unique c_nona lu2
#> <dbl> <dbl> <dbl> <list> <int> <list> <int>
#> 1 1 1 1 <dbl [3]> 1 <dbl [3]> 3
#> 2 1 1 99 <dbl [3]> 2 <dbl [3]> 3
#> 3 1 NA 1 <dbl [3]> 2 <dbl [2]> 3
```

List columns consists of multiple elements, that’s way we need `map`

and friends.

# Limitations

Note that this approach only works for testing “one-equality”, ie., if all values are the same. If you want to test, for example, of a target vector [1,2] is identical to the reference vector [3,4], testing the number of different items will not work. Obviously, in both cases there are 2 different values, but they are not identical. In other words, we can test for *inequality* but not for *equality*.

In addition, using this method, only *two* vectors can be compared at a time.