Chapter 8 Logical Vectors
Welcome back to Quantitative Reasoning! In the previous tutorials, we worked with a data frame that contains information about passengers and crew members on the Titanic. Among other things, we learned how to build subsets of this data frame. For example, we used this command in tutorial 06 to generate a data frame that contained only 40-year old passengers.
$age == 40, ] titanic[titanic
In this video, I want to give some background information about what this
command exactly does.
We’ll learn that titanic$age == 40
is an example of a logical vector.
We’ll find out what a logical vector is and how we can use logical vectors for
subsetting.
We’ll also learn how we can generate logical vectors with comparison operators.
Let’s go to the titanic
project and start a new script with the menu option
“File” -> “New File” -> “R Script”.
Let’s call the script explore_logical.R.
A logical vector contains only elements that are TRUE or FALSE.
We can use the c()
function to combine multiple logical elements into a
vector, for example
c(TRUE, TRUE, FALSE, FALSE, TRUE) z <-
Note that TRUE and FALSE must be spelled with capital letters.
In the Environment tab, the data class of z
is abbreviated by “logi”.
We can use a logical vector to build subsets of other vectors that are equally
long.
For example, let’s define a character vector called players
that contains
“Rachel”, “Tatyana”, “Noah”, “Quentin” and “Aisha”.
I’m going to line up the elements in z
and players
for later convenience.
R is largely insensitive to the exact placement of spaces.
Here I’m taking advantage of this feature for demonstration purposes.
c(TRUE, TRUE, FALSE, FALSE, TRUE)
z <- c("Rachel", "Tatyana", "Noah", "Quentin", "Aisha") players <-
Let’s click “Run” so that the vector players
is in the environment.
If we subset the vector players
by z
, we get only those indices of
players
whose value in z
is TRUE
.
players[z]
## [1] "Rachel" "Tatyana" "Aisha"
For example, the result contains "Rachel"
because the first element in z
is
TRUE
.
The result also contains "Tatyana"
because the second element in z
is also
TRUE
.
But the result doesn’t contain "Noah"
because the third element in z
is
FALSE
.
And so on.
At this stage, we might be wondering why subsetting with logical vectors might
be anything useful.
Couldn’t we simply use the method we learned in tutorial 02, namely subset with
a numeric vector?
c(1, 2, 5)] players[
## [1] "Rachel" "Tatyana" "Aisha"
In practice, it turns out that subsetting with logical vectors is often easier than determining the numeric indices that we want to keep in the subset. To see why, we first have to learn how we typically generate logical vectors from numeric or character data.
Logical vectors are a natural outcome when we make comparisons. Suppose the scores of our five players are 5, 6, 1, 6 and 3.
c(5, 6, 1, 6, 3) scores <-
R can tell us which of these scores is equal to 6, with the command
== 6 scores
## [1] FALSE TRUE FALSE TRUE FALSE
The result is a logical vector of the same length as scores
.
That is, it has five elements.
We get TRUE
at those indices where the score was indeed 6, namely at the
second and fourth position.
For all other values we get FALSE
.
If we now want to know the names of the players with a score of 6, we can
simply use the result of scores == 6
to subset the vector players
.
== 6] players[scores
## [1] "Tatyana" "Quentin"
So the lucky ones were Tatyana and Quentin.
The two consecutive equals signs are an example of a comparison operator.
Another comparison operator is an exclamation mark followed by an equals sign.
While the ==
operator returns TRUE
if and only if both sides of the
operator are equal, the !=
operator returns TRUE
if and only if both sides
are not equal.
!= 6] players[scores
## [1] "Rachel" "Noah" "Aisha"
Other inequality operators use the less-than symbol or the greater-than symbol, possibly in combination with an equals sign. For example, to find the players with a score less than 5, we type
< 5] players[scores
## [1] "Noah" "Aisha"
To find the players with a score less than or equal to 5, we write
<= 5] players[scores
## [1] "Rachel" "Noah" "Aisha"
Here we used the comparison operators for numeric comparisons.
The operators also work with character vectors, where they refer to alphabetic
order.
For example, "Gastner" > "Einstein"
is of course TRUE
, but unfortunately
only because G comes later in alphabetic order than E.
Let’s return to our titanic.R script.
If the titanic
data frame isn’t currently in your environment, please run the
read.csv()
command again.
read.csv("~/QR/titanic/titanic.csv") titanic <-
The command
$age == 40, ] titanic[titanic
is similar to the example we’ve just seen.
The comma in the square brackets makes this expression look a little bit more
complicated.
But bear in mind that titanic
is a data frame and, for this reason, it has
rows and columns.
In tutorial 06, we learned that the expression before the comma refers to the
rows.
The blank space after the comma means that we keep all columns.
In the highlighted command, we use the comparison operator ==
to return a
logical vector that is TRUE
if and only if titanic$age
equals 40.
As a result, we keep exactly those rows of the data frame titanic
for which
the corresponding person was exactly 40 years old.
Let’s summarise what we’ve learned in this tutorial.
- Logical vectors contain elements that can only be either
TRUE
orFALSE
. - We can subset a vector by placing a logical vector inside square brackets.
- Similarly we can filter out those rows in a data frame that satisfy a certain condition by placing a logical vector inside square brackets followed by a comma.
- We can generate a logical vector with the comparison operators
==
,!=
,<
,>
,<=
and>=
.
In the next tutorial, we learn more about logical vectors.
See you soon.