Chapter 8 Logical Vectors

Welcome back to Quantitative Reasoning! In the previous tutorials, we worked with a data frame that contains information about passengers and crew members on the Titanic. Among other things, we learned how to build subsets of this data frame. For example, we used this command in tutorial 06 to generate a data frame that contained only 40-year old passengers.

titanic[titanic$age == 40, ]

In this video, I want to give some background information about what this command exactly does. We’ll learn that titanic$age == 40 is an example of a logical vector. We’ll find out what a logical vector is and how we can use logical vectors for subsetting. We’ll also learn how we can generate logical vectors with comparison operators.

Let’s go to the titanic project and start a new script with the menu option “File” -> “New File” -> “R Script”. Let’s call the script explore_logical.R. A logical vector contains only elements that are TRUE or FALSE. We can use the c() function to combine multiple logical elements into a vector, for example

z <- c(TRUE, TRUE, FALSE, FALSE, TRUE)

Note that TRUE and FALSE must be spelled with capital letters. In the Environment tab, the data class of z is abbreviated by “logi”. We can use a logical vector to build subsets of other vectors that are equally long. For example, let’s define a character vector called players that contains “Rachel”, “Tatyana”, “Noah”, “Quentin” and “Aisha”. I’m going to line up the elements in z and players for later convenience. R is largely insensitive to the exact placement of spaces. Here I’m taking advantage of this feature for demonstration purposes.

z <-       c(TRUE,     TRUE,      FALSE,  FALSE,     TRUE)
players <- c("Rachel", "Tatyana", "Noah", "Quentin", "Aisha")

Let’s click “Run” so that the vector players is in the environment. If we subset the vector players by z, we get only those indices of players whose value in z is TRUE.

players[z]
## [1] "Rachel"  "Tatyana" "Aisha"

For example, the result contains "Rachel" because the first element in z is TRUE. The result also contains "Tatyana" because the second element in z is also TRUE. But the result doesn’t contain "Noah" because the third element in z is FALSE. And so on. At this stage, we might be wondering why subsetting with logical vectors might be anything useful. Couldn’t we simply use the method we learned in tutorial 02, namely subset with a numeric vector?

players[c(1, 2, 5)]
## [1] "Rachel"  "Tatyana" "Aisha"

In practice, it turns out that subsetting with logical vectors is often easier than determining the numeric indices that we want to keep in the subset. To see why, we first have to learn how we typically generate logical vectors from numeric or character data.

Logical vectors are a natural outcome when we make comparisons. Suppose the scores of our five players are 5, 6, 1, 6 and 3.

scores <- c(5, 6, 1, 6, 3)

R can tell us which of these scores is equal to 6, with the command

scores == 6
## [1] FALSE  TRUE FALSE  TRUE FALSE

The result is a logical vector of the same length as scores. That is, it has five elements. We get TRUE at those indices where the score was indeed 6, namely at the second and fourth position. For all other values we get FALSE.

If we now want to know the names of the players with a score of 6, we can simply use the result of scores == 6 to subset the vector players.

players[scores == 6]
## [1] "Tatyana" "Quentin"

So the lucky ones were Tatyana and Quentin.

The two consecutive equals signs are an example of a comparison operator. Another comparison operator is an exclamation mark followed by an equals sign. While the == operator returns TRUE if and only if both sides of the operator are equal, the != operator returns TRUE if and only if both sides are not equal.

players[scores != 6]
## [1] "Rachel" "Noah"   "Aisha"

Other inequality operators use the less-than symbol or the greater-than symbol, possibly in combination with an equals sign. For example, to find the players with a score less than 5, we type

players[scores < 5]
## [1] "Noah"  "Aisha"

To find the players with a score less than or equal to 5, we write

players[scores <= 5]
## [1] "Rachel" "Noah"   "Aisha"

Here we used the comparison operators for numeric comparisons. The operators also work with character vectors, where they refer to alphabetic order. For example, "Gastner" > "Einstein" is of course TRUE, but unfortunately only because G comes later in alphabetic order than E.

Let’s return to our titanic.R script. If the titanic data frame isn’t currently in your environment, please run the read.csv() command again.

titanic <- read.csv("~/QR/titanic/titanic.csv")

The command

titanic[titanic$age == 40, ]

is similar to the example we’ve just seen. The comma in the square brackets makes this expression look a little bit more complicated. But bear in mind that titanic is a data frame and, for this reason, it has rows and columns. In tutorial 06, we learned that the expression before the comma refers to the rows. The blank space after the comma means that we keep all columns.

In the highlighted command, we use the comparison operator == to return a logical vector that is TRUE if and only if titanic$age equals 40. As a result, we keep exactly those rows of the data frame titanic for which the corresponding person was exactly 40 years old.

Let’s summarise what we’ve learned in this tutorial.

  • Logical vectors contain elements that can only be either TRUE or FALSE.
  • We can subset a vector by placing a logical vector inside square brackets.
  • Similarly we can filter out those rows in a data frame that satisfy a certain condition by placing a logical vector inside square brackets followed by a comma.
  • We can generate a logical vector with the comparison operators ==, !=, <, >, <= and >=.

In the next tutorial, we learn more about logical vectors.

See you soon.