Chapter 35 for
-loops
Welcome back to Quantitative Reasoning!
Computers, unlike humans, don’t mind repeating tasks over and over again.
Syntactic features of a programming language that cause a computer to repeat
one or several instructions are called loops.
R has different types of loops suitable for different applications.
Starting with tutorial 7, we’ve already seen many instances where
vectorisation carries out a repeated task.
For example, in the command c(2, 4, 7, 6) + c(5, 7, 3, 9)
, the vectorised
+
operator carries out a loop over the four elements in each argument.
In each iteration (i.e. for each element in the loop), R carries out one
addition.
c(2, 4, 7, 6) + c(5, 7, 3, 9)
## [1] 7 11 10 15
Some preinstalled functions also implicitly execute loops.
For example, we can simulate ten rolls of a die with sample()
.
sample(6, size = 10, replace = TRUE)
## [1] 5 4 3 4 3 6 1 3 5 3
In this case, the loop has ten iterations, one for each die roll.
Vectorised operators and preinstalled functions are the fastest and most
elegant techniques to implement loops in R, but there are limits to these
techniques when we want to iterate complex procedures.
In this tutorial, we’ll learn how to cope with such situations by using
for
-loops.
We can use a for
-loop when we have a vector, say v
, and want to perform
some action for each of v
’s elements.
The general syntax is as follows.
for (i in v) {
# Command(s) to be repeated for all values i in v
}
Here is a simple example of a for
-loop.
for (i in c("Our", "first", "for-loop")) {
print(i)
}
## [1] "Our"
## [1] "first"
## [1] "for-loop"
In the code block between the curly braces, i
is given the values in the
vector c("Our", "first", "for-loop")
one after another from the first to the
last element.
The command print(i)
shows the value of i
in each iteration as console
output.
We don’t need to give the “iterator” variable the name “i
” as long as
we consistently change its name everywhere in the for
-loop.
For example, we may want to replace “i
” by the more descriptive variable
name “word
”.
The code produces the same output as before.
for (word in c("Our", "first", "for-loop")) {
print(word)
}
We can also do more interesting things than printing out all values.
For example, we can repeat simulations of a random experiment to estimate the
probability of certain outcomes.
Let’s consider drawing five random cards from a standard deck of 52 poker
cards.
The deck contains 13 cards from each of the four suits: diamonds, clubs,
spades and hearts.
How many different suits do we get in a random draw of 5 cards?
Let’s simulate the draws with R and tally the results.
We first generate a deck of cards with the rep()
function from our previous
tutorial.
rep(c("diamonds", "clubs", "spades", "hearts"), 13) cards <-
Then we initialise a vector obs
of length four, in which we count how often
the four possible outcomes occur when we repeatedly draw cards randomly
(i.e. obs[n]
stores the number of trials in which we draw n
different
suits).
We can initialise obs
with rep(0, 4)
or numeric(4)
to ensure that we
start counting from zero.
# Initialise tally of observations
# obs[n]: number of trials in which we draw n different suits
numeric(4) obs <-
The next code element is a for
-loop in which we repeatedly draw five random
cards.
If we want to repeat the random draw 10,000 times, we can start the
for
-loop with for (i in 1:10000)
.
In each of the 10,000 iterations, we draw five cards and count the number
of suits.
Let’s call this number n
.
At the end of each iteration, we increment the number of observations of
n
by 1.
for (i in 1:10000) {
# Random draw of 5 cards
# n: number of suits in this draw
# Increment observations of n
}
After the for
-loop, we can view the result (e.g. as a bar chart of the
relative frequencies stored in obs
).
rep(c("diamonds", "clubs", "spades", "hearts"), 13)
cards <-
# Initialise tally of observations
# obs[n]: number of trials in which we draw n different suits
numeric(4)
obs <-for (i in 1:10000) {
# Random draw of 5 cards
# n: number of suits in this draw
# Increment observations of n
}barplot(obs / sum(obs),
main = "Number of Suits in a Random Draw of 5 Poker Cards",
xlab = "Number of suits",
ylab = "Estimated probability",
names.arg = 1:4)
grid()
While we’re learning for
-loops, we may find it useful to describe the
steps in natural language before we translate each step into computer code.
Let’s start by translating our first comment inside the curly braces: we can
simulate a random draw with sample(cards, 5)
.
For later convenience, we store the sample in a variable called s
.
Next we need to count the number of unique values in s
.
If we apply the length()
function to unique(s)
, we obtain the number
n
that we’re looking for.
Finally, we add 1 to the number of observations of n
.
rep(c("diamonds", "clubs", "spades", "hearts"), 13)
cards <-
# Initialise tally of observations
# obs[n]: number of trials in which we draw n different suits
numeric(4)
obs <-for (i in 1:10000) {
sample(cards, 5) # Random draw of 5 cards
s <- length(unique(s)) # n: number of suits in this draw
n <- obs[n] + 1 # Increment observations of n
obs[n] <-
}barplot(obs / sum(obs),
main = "Number of Suits in a Random Draw of 5 Poker Cards",
xlab = "Number of suits",
ylab = "Estimated probability",
names.arg = 1:4)
grid()
The estimated probabilities are in good agreement with the exact
probabilities that can be derived with combinatoric arguments.
If you’re curious, you can find a derivation at the link below this video
(https://math.stackexchange.com/questions/224974/probability-for-suits-in-5-card-poker-hand).
One might argue that estimating the probabilities with computer simulations
doesn’t add new insights if the exact probabilities are already known.
However, there are many other random experiments for which the probabilities
of the outcomes are unknown.
In such cases, it’s common practice to resort to computer simulations that
contain for
-loops.
In the scientific literature, computer simulations that involve random numbers
are called Monte Carlo simulations.
Below this video, you can find a link to a paper that describes the historical
events that led to the invention of the Monte Carlo method
(http://library.lanl.gov/cgi-bin/getfile?15-12.pdf).
Most for
-loops only work correctly if certain variables are properly
initialised.
In our example, obs
must contain four zeros before the for
-loop gets
underway.
If obs
isn’t initialised before entering the for
-loop, we get errors or
nonsensical results.
Whenever you work with a for
-loop, ask yourself whether any variable that
appears inside the loop needs to be initialised.
If yes, then always run the for
-loop together with the initialisation.
It’s usually safer to run the entire script by clicking the “Source” button
instead of running only selected lines with the “Run” button.
for
-loops are easier to read when we indent the lines between the curly
braces.
The indentation shows that these lines play a special role: they contain the
commands that are repeated during the loop.
The RStudio editor usually does a good job indenting lines correctly while
we’re typing.
If there’s something wrong with the indentation, we can select the entire
script with the keyboard shortcut “Command+A” on a Mac or “Ctrl+A” on Windows
or Linux.
We then indent all lines with the keyboard shortcut “Command+I” or “Ctrl+I”.
If the indentation still looks wrong, we probably made a typo.
In that case, we must check carefully whether parentheses or curly braces
appear in the correct places.
If your for
-loop doesn’t work as intended even after fixing the obvious
typos, don’t despair.
Even experienced programmers regularly make mistakes when writing for
-loops
from scratch.
In our next tutorial, I show you a few basic techniques to trace mistakes.
Let’s recap what we learned about for
-loops.
We use
for
-loops to repeat commands in an R script.for
-loops should be used sparingly. Vectorised operators and preinstalled functions are almost always better options.A typical use case for a
for
-loop is a simulation of a random experiment.The syntax of a
for
-loop is:for (i in v) { # Command(s) to be repeated for all values i in v }
We should indent the commands between the curly braces so that it’s easy to see which commands are repeated.
We must often initialise some variables before a
for
-loop (e.g. variables that count how often certain outcomes occur during the loop).We can avoid accidentally leaving out initialisations by clicking “Source” instead of “Run”.
Next time we go on a bug hunt.
See you soon.