Chapter 19 which() Indices are TRUE?

Welcome back to Quantitative Reasoning! Last time we learned that the function any() can tell us whether a logical vector contains a TRUE element. This information is useful, but, if any() returns TRUE, we still don’t know which element in the argument is TRUE. In this tutorial, we’ll learn that we can find the answer with the function which(). We’ll also learn that we can find the index of the minimum value in a numeric vector with which.min(). Similarly, which.max() returns the index of the maximum value.

Let’s start with a simple example. Here is a logical vector.

z <- c(TRUE, TRUE, FALSE, FALSE, TRUE)

If we pass z as argument to the function which(), the output are the indices of z that are TRUE.

which(z)
## [1] 1 2 5

For example, which(z) returns 1 because the first element of z is TRUE. which(z) also returns 2 because the second element of z is also TRUE, but which(z) doesn’t contain 3 because the third element of z is FALSE.

In practice, which() is usually applied to a much longer vector than our example vector z. For instance, in our previous tutorial we noticed that the age of some travellers on the Titanic is unknown, but we couldn’t easily identify where these travellers are located in the titanic data frame. Thanks to the which() function, this task is now a piece of cake.

which(is.na(titanic$age))
## [1]  729 1127 1457

The output shows that there are three travellers with unknown age, and they are in the rows 729, 1127 and 1457 of the titanic data frame. We can use these row indices to find out the family names of these three travellers. Let’s first assign the result of the which() function to a variable w, and then we use w to subset titanic$fam_name.

w <- which(is.na(titanic$age))
titanic$fam_name[w]
## [1] "GEORGIEV" "KRAEFF"   "NISKANEN"

In this example, the detour via which() isn’t really necessary. We could have found the same information by directly using is.na(titanic$age) in the square brackets after titanic$fam_name.

titanic$fam_name[is.na(titanic$age)]
## [1] "GEORGIEV" "KRAEFF"   "NISKANEN"

But, if subsetting involves a logical vector that contains NA, the detour via which() can sometimes be worthwhile. A nice feature of which() is that it doesn’t contain indices of NA values. For example, let’s append NA at the end of the vector z.

z <- c(TRUE, TRUE, FALSE, FALSE, TRUE, NA)

which(z) still returns the same vector as before.

which(z)
## [1] 1 2 5

Suppose we want to use z to subset another vector, for example

players <- c("Rachel", "Tatyana", "Noah", "Quentin", "Aisha", "Warren")

If we subset players by z, the last element is NA.

players[z]
## [1] "Rachel"  "Tatyana" "Aisha"   NA

By contrast, if we subset players by which(z), the NA element is removed.

players[which(z)]
## [1] "Rachel"  "Tatyana" "Aisha"

Neither of these two subsetting methods is better or worse. They are different, and one should think carefully in each application whether it’s more appropriate to keep or remove NA in the result.

Closely related to which() are the functions which.min() and which.max(). which.min() finds the index of the minimum element in the argument, and which.max() finds the index of the maximum. For example, here is how we find the row in the titanic data frame that contains the oldest traveller.

which.max(titanic$age)
## [1] 1946

There is one caveat to which.min() and which.max(). They return a single index even if there are multiple equally small minima or multiple equally large maxima in the data. For example, the minimum age of travellers is 0.

min(titanic$age, na.rm = TRUE)
## [1] 0

With the sum() function, we can find out how many travellers had this minimum age. Let’s first assign the result of the min() function to a variable called min_age. Then we use the == operator.

min_age <- min(titanic$age, na.rm = TRUE)
sum(titanic$age == min_age, na.rm = TRUE)
## [1] 9

We conclude that there were 9 travellers who were still in their first year of life. which.min() only returns the row index of one of these travellers.

which.min(titanic$age)
## [1] 25

Here’s a small challenge for you. Can you come up with a method that returns the row indices of all 9 travellers whose age is equal to the minimum? If you need a hint, take a look at the documentation at ?which.min().

Here is a summary of this tutorial.

  • We can find the indices of TRUE elements in a logical vector with the which() function.
  • which() eliminates NAs from the result. This feature can sometimes be convenient.
  • which.min() returns the index of a minimum element in a numeric vector.
  • The corresponding function that finds the index of a maximum element is which.max().

In the next tutorial, we learn how we can compute summary statistics of data subsets.

See you soon.