Chapter 36 Debugging `for`-loops

Welcome back to Quantitative Reasoning! In our previous tutorial, we wrote our first for-loops. There weren’t any mistakes in those for-loops because I debugged them before I wrote that tutorial. Alas, in real life, it’s common to make mistakes in early versions of computer programs. In this tutorial, we’ll see some frequent kinds of mistakes when working with for-loops. We’ll also learn how to detect and fix these mistakes.

Last time we wrote a script that estimated the probability distribution of the number of suits in a random draw of five poker cards. For demonstration purposes, I’ve deliberately introduced a few mistakes into our script. Let’s clear out our environment by clicking on the broom button in the Environment tab. When we source the script, we get an error.

cards <- rep(c("diamonds", "clubs", "spades", "hearts"), 13)
for (i in 1:10000) {
  s <- sample(cards, 5)   # Random draw of 5 cards
  n <- length(s)          # n: number of suits in this draw
  obs[n] <- obs[n] + 1    # Increment observations of n
}

## Error in eval(expr, envir, enclos): object 'obs' not found

In the error message, R points out that it can’t find the object obs. Error messages of this kind usually hint at a missing initialisation. Looking through our script, we notice that we attempt to access an element in obs with square brackets during the for-loop, but we forgot to initialise obs before the loop. R can’t access elements in a vector that doesn’t exist, so let’s fix this bug by inserting a proper initialisation.

Unfortunately, we aren’t out of the woods yet. Now we get a different error message.

cards <- rep(c("diamonds", "clubs", "spades", "hearts"), 13)

# Initialise tally of observations
# obs[n]: number of trials in which we draw n different suits
obs <- numeric(4)
for (i in 1:10000) {
  s <- sample(cards, 5)   # Random draw of 5 cards
  n <- length(s)          # n: number of suits in this draw
  obs[n] <- obs[n] + 1    # Increment observations of n
}
barplot(obs / sum(obs),
        main = "Number of Suits in a Random Draw of 5 Poker Cards",
        xlab = "Number of suits",
        ylab = "Estimated probability",
        names.arg = 1:4)

## Error in plot.window(xlim, ylim, log = log, ...): need finite 'ylim' values

This error message hints at a problem that R encountered when preparing a figure with the barplot() function: “Error in plot.window”. Unfortunately, R doesn’t name the variable in the script that causes the problem. In such cases, it’s a good idea to look at the Environment tab to check whether we can spot a variable with an unexpected value. There’s an NA in obs, which shouldn’t be there. And why does obs have five elements although we initialized it with only four elements? Something seems to be wrong with the for-loop. How can we trace the mistake? Here are three basic debugging techniques.

Reduce the number of iterations.
Inspect the values of variables during the loop with the print() function.
If the for-loop makes calls to R’s random number generator (e.g. with the functions sample() or rnorm()), fix a seed before the loop so that results are reproducible.

Let’s apply these techniques. We reduce the number of iterations from 10,000 to 3 and insert several calls to the print() function. The first call to print() reveals that a new iteration has started. We also print the current value of the iterator i, the sample s that is drawn during this iteration, the number of suits n and the values stored in obs. Finally we set an arbitrary seed before the for-loop.

cards <- rep(c("diamonds", "clubs", "spades", "hearts"), 13)

# Initialise tally of observations
# obs[n]: number of trials in which we draw n different suits
obs <- numeric(4)
set.seed(1234567)
for (i in 1:3) {
  print("New iteration")
  print(i)
  s <- sample(cards, 5)   # Random draw of 5 cards
  print(s)
  n <- length(s)          # n: number of suits in this draw
  print(n)
  obs[n] <- obs[n] + 1    # Increment observations of n
  print(obs)
}

## [1] "New iteration"
## [1] 1
## [1] "diamonds" "clubs"    "spades"   "diamonds" "diamonds"
## [1] 5
## [1]  0  0  0  0 NA
## [1] "New iteration"
## [1] 2
## [1] "spades" "clubs"  "hearts" "spades" "spades"
## [1] 5
## [1]  0  0  0  0 NA
## [1] "New iteration"
## [1] 3
## [1] "diamonds" "hearts"   "spades"   "diamonds" "clubs"   
## [1] 5
## [1]  0  0  0  0 NA

We can now analyse the sequence of events in small steps. During the first iteration, we sample "diamonds", "clubs", "spades", "diamonds" and again "diamonds". The next line of output shows that n has the value 5. This value is alarming because there are only three different suits in the sample ("diamonds", "clubs" and "spades"), so n should have the value 3 instead of 5. The following line of output shows that our code has derailed. We are trying to access obs[5], which we haven’t initialised, so R appends NA as fifth value to obs. We conclude that something must have gone terribly wrong when we assigned a value to n. Going back to the code, the offending assignment must be n <- length(s). Can you see what’s wrong?

The variable s is a sample of five cards, so it’s a vector of length 5. Consequently, our current script assigns the value 5 to n. However, we wanted to assign the number of suits in s, not the number of cards. We forgot to include the unique() function on the right-hand side of the assignment. Let’s correct this mistake and source the script again.

cards <- rep(c("diamonds", "clubs", "spades", "hearts"), 13)

# Initialise tally of observations
# obs[n]: number of trials in which we draw n different suits
obs <- numeric(4)
set.seed(1234567)
for (i in 1:3) {
  print("New iteration")
  print(i)
  s <- sample(cards, 5)   # Random draw of 5 cards
  print(s)
  n <- length(unique(s))  # n: number of suits in this draw
  print(n)
  obs[n] <- obs[n] + 1    # Increment observations of n
  print(obs)
}

## [1] "New iteration"
## [1] 1
## [1] "diamonds" "clubs"    "spades"   "diamonds" "diamonds"
## [1] 3
## [1] 0 0 1 0
## [1] "New iteration"
## [1] 2
## [1] "spades" "clubs"  "hearts" "spades" "spades"
## [1] 3
## [1] 0 0 2 0
## [1] "New iteration"
## [1] 3
## [1] "diamonds" "hearts"   "spades"   "diamonds" "clubs"   
## [1] 4
## [1] 0 0 2 1

Because we use the same seed as in our previous run, we draw the same sample of cards in the first iteration: "diamonds", "clubs", "spades", "diamonds" and "diamonds". In our new version of the script, we correctly assign a value of 3 to n. Looking at the following line of output, we confirm that the third element in obs is set equal to 1, exactly as we want it. As a simple exercise, please go through the rest of the console output to confirm that the next two iterations also produce the expected result. When we are confident that we fixed all bugs, we should erase set.seed(), revert to the old number of iterations and remove the print() functions.

There are more sophisticated debugging methods than the one I’ve just shown you. RStudio has a menu item “Debug” that offers more advanced options. I personally hardly ever use them. For the relatively simple R scripts that we write in this course, RStudio’s debugger is likely to be overkill. In summary, I recommend the following three basic techniques as an effective first-aid kit:

Reduce the number of iterations.
Print out the values of variables during the loop.
If you use a random number generator, fix a seed so that outcomes are reproducible.

In the last two tutorials, we learned how to work with for-loops. They help us repeat a sequence of commands so that we don’t need to type them repeatedly in our code. Next time we learn another method to reduce the amount of repeating lines of code: packaging elements of reusable code into functions.

See you soon.

Chapter 36 Debugging for-loops

Chapter 36 Debugging `for`-loops