Chapter 35 for-loops

Welcome back to Quantitative Reasoning! Computers, unlike humans, don’t mind repeating tasks over and over again. Syntactic features of a programming language that cause a computer to repeat one or several instructions are called loops. R has different types of loops suitable for different applications. Starting with tutorial 7, we’ve already seen many instances where vectorisation carries out a repeated task. For example, in the command c(2, 4, 7, 6) + c(5, 7, 3, 9), the vectorised + operator carries out a loop over the four elements in each argument. In each iteration (i.e. for each element in the loop), R carries out one addition.

c(2, 4, 7, 6) + c(5, 7, 3, 9)
## [1]  7 11 10 15

Some preinstalled functions also implicitly execute loops. For example, we can simulate ten rolls of a die with sample().

sample(6, size = 10, replace = TRUE)
##  [1] 5 4 3 4 3 6 1 3 5 3

In this case, the loop has ten iterations, one for each die roll. Vectorised operators and preinstalled functions are the fastest and most elegant techniques to implement loops in R, but there are limits to these techniques when we want to iterate complex procedures. In this tutorial, we’ll learn how to cope with such situations by using for-loops.

We can use a for-loop when we have a vector, say v, and want to perform some action for each of v’s elements. The general syntax is as follows.

for (i in v) {
  # Command(s) to be repeated for all values i in v
}

Here is a simple example of a for-loop.

for (i in c("Our", "first", "for-loop")) {
  print(i)
}
## [1] "Our"
## [1] "first"
## [1] "for-loop"

In the code block between the curly braces, i is given the values in the vector c("Our", "first", "for-loop") one after another from the first to the last element. The command print(i) shows the value of i in each iteration as console output. We don’t need to give the “iterator” variable the name “i” as long as we consistently change its name everywhere in the for-loop. For example, we may want to replace “i” by the more descriptive variable name “word”. The code produces the same output as before.

for (word in c("Our", "first", "for-loop")) {
  print(word)
}

We can also do more interesting things than printing out all values. For example, we can repeat simulations of a random experiment to estimate the probability of certain outcomes. Let’s consider drawing five random cards from a standard deck of 52 poker cards. The deck contains 13 cards from each of the four suits: diamonds, clubs, spades and hearts. How many different suits do we get in a random draw of 5 cards? Let’s simulate the draws with R and tally the results. We first generate a deck of cards with the rep() function from our previous tutorial.

cards <- rep(c("diamonds", "clubs", "spades", "hearts"), 13)

Then we initialise a vector obs of length four, in which we count how often the four possible outcomes occur when we repeatedly draw cards randomly (i.e. obs[n] stores the number of trials in which we draw n different suits). We can initialise obs with rep(0, 4) or numeric(4) to ensure that we start counting from zero.

# Initialise tally of observations
# obs[n]: number of trials in which we draw n different suits
obs <- numeric(4)

The next code element is a for-loop in which we repeatedly draw five random cards. If we want to repeat the random draw 10,000 times, we can start the for-loop with for (i in 1:10000). In each of the 10,000 iterations, we draw five cards and count the number of suits. Let’s call this number n. At the end of each iteration, we increment the number of observations of n by 1.

for (i in 1:10000) {
  # Random draw of 5 cards
  # n: number of suits in this draw
  # Increment observations of n
}

After the for-loop, we can view the result (e.g. as a bar chart of the relative frequencies stored in obs).

cards <- rep(c("diamonds", "clubs", "spades", "hearts"), 13)

# Initialise tally of observations
# obs[n]: number of trials in which we draw n different suits
obs <- numeric(4)   
for (i in 1:10000) {
  # Random draw of 5 cards
  # n: number of suits in this draw
  # Increment observations of n
}
barplot(obs / sum(obs),
        main = "Number of Suits in a Random Draw of 5 Poker Cards",
        xlab = "Number of suits",
        ylab = "Estimated probability",
        names.arg = 1:4)
grid()

While we’re learning for-loops, we may find it useful to describe the steps in natural language before we translate each step into computer code. Let’s start by translating our first comment inside the curly braces: we can simulate a random draw with sample(cards, 5). For later convenience, we store the sample in a variable called s. Next we need to count the number of unique values in s. If we apply the length() function to unique(s), we obtain the number n that we’re looking for. Finally, we add 1 to the number of observations of n.

cards <- rep(c("diamonds", "clubs", "spades", "hearts"), 13)

# Initialise tally of observations
# obs[n]: number of trials in which we draw n different suits
obs <- numeric(4)   
for (i in 1:10000) {
  s <- sample(cards, 5)   # Random draw of 5 cards
  n <- length(unique(s))  # n: number of suits in this draw
  obs[n] <- obs[n] + 1    # Increment observations of n
}
barplot(obs / sum(obs),
        main = "Number of Suits in a Random Draw of 5 Poker Cards",
        xlab = "Number of suits",
        ylab = "Estimated probability",
        names.arg = 1:4)
grid()

The estimated probabilities are in good agreement with the exact probabilities that can be derived with combinatoric arguments. If you’re curious, you can find a derivation at the link below this video (https://math.stackexchange.com/questions/224974/probability-for-suits-in-5-card-poker-hand). One might argue that estimating the probabilities with computer simulations doesn’t add new insights if the exact probabilities are already known. However, there are many other random experiments for which the probabilities of the outcomes are unknown. In such cases, it’s common practice to resort to computer simulations that contain for-loops. In the scientific literature, computer simulations that involve random numbers are called Monte Carlo simulations. Below this video, you can find a link to a paper that describes the historical events that led to the invention of the Monte Carlo method (http://library.lanl.gov/cgi-bin/getfile?15-12.pdf).

Most for-loops only work correctly if certain variables are properly initialised. In our example, obs must contain four zeros before the for-loop gets underway. If obs isn’t initialised before entering the for-loop, we get errors or nonsensical results. Whenever you work with a for-loop, ask yourself whether any variable that appears inside the loop needs to be initialised. If yes, then always run the for-loop together with the initialisation. It’s usually safer to run the entire script by clicking the “Source” button instead of running only selected lines with the “Run” button.

for-loops are easier to read when we indent the lines between the curly braces. The indentation shows that these lines play a special role: they contain the commands that are repeated during the loop. The RStudio editor usually does a good job indenting lines correctly while we’re typing. If there’s something wrong with the indentation, we can select the entire script with the keyboard shortcut “Command+A” on a Mac or “Ctrl+A” on Windows or Linux. We then indent all lines with the keyboard shortcut “Command+I” or “Ctrl+I”. If the indentation still looks wrong, we probably made a typo. In that case, we must check carefully whether parentheses or curly braces appear in the correct places.

If your for-loop doesn’t work as intended even after fixing the obvious typos, don’t despair. Even experienced programmers regularly make mistakes when writing for-loops from scratch. In our next tutorial, I show you a few basic techniques to trace mistakes.

Let’s recap what we learned about for-loops.

  • We use for-loops to repeat commands in an R script.

  • for-loops should be used sparingly. Vectorised operators and preinstalled functions are almost always better options.

  • A typical use case for a for-loop is a simulation of a random experiment.

  • The syntax of a for-loop is:

    for (i in v) {
      # Command(s) to be repeated for all values i in v
    }
  • We should indent the commands between the curly braces so that it’s easy to see which commands are repeated.

  • We must often initialise some variables before a for-loop (e.g. variables that count how often certain outcomes occur during the loop).

  • We can avoid accidentally leaving out initialisations by clicking “Source” instead of “Run”.

Next time we go on a bug hunt.

See you soon.