Chapter 19 which()
Indices are TRUE
?
Welcome back to Quantitative Reasoning!
Last time we learned that the function any()
can tell us whether a logical
vector contains a TRUE
element.
This information is useful, but, if any()
returns TRUE
, we still don’t know
which element in the argument is TRUE
.
In this tutorial, we’ll learn that we can find the answer with the function
which()
.
We’ll also learn that we can find the index of the minimum value in a numeric
vector with which.min()
.
Similarly, which.max()
returns the index of the maximum value.
Let’s start with a simple example. Here is a logical vector.
c(TRUE, TRUE, FALSE, FALSE, TRUE) z <-
If we pass z
as argument to the function which()
, the output are the
indices of z
that are TRUE
.
which(z)
## [1] 1 2 5
For example, which(z)
returns 1
because the first element of z
is
TRUE
.
which(z)
also returns 2
because the second element of z
is also TRUE
,
but which(z)
doesn’t contain 3
because the third element of z
is
FALSE
.
In practice, which()
is usually applied to a much longer vector than our
example vector z
.
For instance, in our previous tutorial we noticed that the age of some
travellers on the Titanic is unknown, but we couldn’t easily identify where
these travellers are located in the titanic
data frame.
Thanks to the which()
function, this task is now a piece of cake.
which(is.na(titanic$age))
## [1] 729 1127 1457
The output shows that there are three travellers with unknown age, and they
are in the rows 729,
1127 and 1457 of the
titanic
data frame.
We can use these row indices to find out the family names of these three
travellers.
Let’s first assign the result of the which()
function to a variable w
,
and then we use w
to subset titanic$fam_name
.
which(is.na(titanic$age))
w <-$fam_name[w] titanic
## [1] "GEORGIEV" "KRAEFF" "NISKANEN"
In this example, the detour via which()
isn’t really necessary.
We could have found the same information by directly using is.na(titanic$age)
in the square brackets after titanic$fam_name
.
$fam_name[is.na(titanic$age)] titanic
## [1] "GEORGIEV" "KRAEFF" "NISKANEN"
But, if subsetting involves a logical vector that contains NA
, the detour
via which()
can sometimes be worthwhile.
A nice feature of which()
is that it doesn’t contain indices of NA
values.
For example, let’s append NA
at the end of the vector z
.
c(TRUE, TRUE, FALSE, FALSE, TRUE, NA) z <-
which(z)
still returns the same vector as before.
which(z)
## [1] 1 2 5
Suppose we want to use z
to subset another vector, for example
c("Rachel", "Tatyana", "Noah", "Quentin", "Aisha", "Warren") players <-
If we subset players
by z
, the last element is NA
.
players[z]
## [1] "Rachel" "Tatyana" "Aisha" NA
By contrast, if we subset players
by which(z)
, the NA
element is removed.
which(z)] players[
## [1] "Rachel" "Tatyana" "Aisha"
Neither of these two subsetting methods is better or worse.
They are different, and one should think carefully in each application whether
it’s more appropriate to keep or remove NA
in the result.
Closely related to which()
are the functions which.min()
and which.max()
.
which.min()
finds the index of the minimum element in the argument, and
which.max()
finds the index of the maximum.
For example, here is how we find the row in the titanic
data frame that
contains the oldest traveller.
which.max(titanic$age)
## [1] 1946
There is one caveat to which.min()
and which.max()
.
They return a single index even if there are multiple equally small minima
or multiple equally large maxima in the data.
For example, the minimum age of travellers is 0.
min(titanic$age, na.rm = TRUE)
## [1] 0
With the sum()
function, we can find out how many travellers had this
minimum age.
Let’s first assign the result of the min()
function to a variable called
min_age
.
Then we use the ==
operator.
min(titanic$age, na.rm = TRUE)
min_age <-sum(titanic$age == min_age, na.rm = TRUE)
## [1] 9
We conclude that there were 9
travellers who were still in their first year of life.
which.min()
only returns the row index of one of these travellers.
which.min(titanic$age)
## [1] 25
Here’s a small challenge for you.
Can you come up with a method that returns the row indices of all
9 travellers whose age is equal
to the minimum?
If you need a hint, take a look at the documentation at ?which.min()
.
Here is a summary of this tutorial.
- We can find the indices of
TRUE
elements in a logical vector with thewhich()
function. which()
eliminatesNA
s from the result. This feature can sometimes be convenient.which.min()
returns the index of a minimum element in a numeric vector.- The corresponding function that finds the index of a maximum element is
which.max()
.
In the next tutorial, we learn how we can compute summary statistics of data subsets.
See you soon.