Chapter 37 Functions
Welcome back to Quantitative Reasoning!
We’ve worked with many preinstalled functions during this course.
All instructions that are followed by a pair of parentheses (e.g. hist()
,
sd()
or lm()
) are functions that perform useful operations when we supply
appropriate arguments as input.
Sometimes R doesn’t have a preinstalled function for operations that would be
useful for us.
In this tutorial, we learn how to fill this gap by writing our own functions.
In tutorial 25, we talked about z-scores. We used this code snippet to calculate z-scores for the performance of heptathletes in two disciplines (200-metre run and long jump) of the 2012 Olympic heptathlon. You can find the data at the URL linked below this video (https://michaelgastner.com/data_for_QR/hept2012.csv)
read.csv("hept2012.csv")
hept <-$z_run200 <-
hept (hept$run200 - mean(hept$run200, na.rm = TRUE)) /
sd(hept$run200, na.rm = TRUE)
$z_lj <-
hept (hept$lj - mean(hept$lj, na.rm = TRUE)) /
sd(hept$lj, na.rm = TRUE)
This code did the job for us back then, but the commands are long and complicated. For an outsider, it wouldn’t be obvious what we are trying to accomplish. We can, of course, insert a comment (e.g.: “The following lines calculate z-scores”).
# The following lines calculate z-scores
$z_run200 <-
hept (hept$run200 - mean(hept$run200, na.rm = TRUE)) /
sd(hept$run200, na.rm = TRUE)
$z_lj <-
hept (hept$lj - mean(hept$lj, na.rm = TRUE)) /
sd(hept$lj, na.rm = TRUE)
Still, it would be even better if the commands were replaced by a function
with a descriptive name (e.g. zscore()
).
$z_run200 <- zscore(hept$run200)
hept$z_lj <- zscore(hept$lj) hept
Code that can be easily understood by another person without specific domain knowledge is called self-documenting. In general, we should aim to write self-documenting code instead of long lines of code with complicated instructions that need explicit comments to be intelligible.
Right now, our code doesn’t work because R doesn’t have a preinstalled
function zscore()
, but it’s easy to write this function ourselves.
Here is the general pattern that we need to follow when we write our own
functions.
function(param1, param2, ...) {
function_name <-# Body of the function:
# commands that do something with param1, param2, ...
# last evaluated object is returned as function value
}
The objects param1
, param2
, … are called the parameters of the
function.
Our zscore()
function only has one parameter: a numeric vector.
We can give the parameter any name we wish.
Let me call it v
for “vector”.
The function body consists of the general commands that calculate z-scores
for a numeric vector v
.
It’s also a good idea to include a comment that explains the function’s main
purpose.
function(v) {
zscore <-
# zscore() returns distance from mean in units of standard deviations
- mean(v, na.rm = TRUE)) / sd(v, na.rm = TRUE)
(v }
It’s best practice to place function definitions before the first call to this
function in the script.
Before we can call any function written by ourselves, we must first add it to
our environment.
To accomplish this task, we run the line with the keyword function
(e.g. by placing the cursor on this line and clicking the Run button).
Now the zscore()
function appears in the Environment tab.
Adding the function to our environment doesn’t directly calculate any concrete
z-scores, but we have provided R with two important pieces of information.
- R is aware that we may, at some point, call a function with the name
zscore
. - If we call
zscore()
, followed by an object inside a pair of parentheses, R makes a copy of this object and calls itv
. Then R performs the commands in the function body ofzscore()
. When finished, R returns the value on the last line of the function body, which is in our case the z-score calculated by(v - mean(v, na.rm = TRUE)) / sd(v, na.rm = TRUE)
.
The concrete value that we give to a function parameter is called an
argument.
For example, when we run zscore(hept$run200)
, the argument is hept$run200
.
The main advantage of a function is that it can perform a specific set of
operations on different objects.
If we swap hept$run200
for hept$lj
in the parentheses, we still calculate
z-scores, but now they are z-scores of a different vector.
$z_run200 <- zscore(hept$run200)
hept$z_lj <- zscore(hept$lj) hept
Functions are mainly used to avoid command sequences that are almost identical copies of each other. Let’s consider another piece of code that we wrote in an earlier lesson. In tutorial 21, we wrote a script that generated a multi-panel plot with histograms of sepal widths for different iris species. I made only a few small edits to make the structure of the code more obvious.
par(mfrow = c(3, 1))
seq(1.8, 4.6, 0.1)
iris_breaks <- c(0, 12)
iris_ylim <- "Sepal width (cm)"
iris_xlab <-hist(iris$Sepal.Width[iris$Species == "setosa"],
breaks = iris_breaks,
ylim = iris_ylim,
xlab = iris_xlab,
main = "setosa")
grid()
hist(iris$Sepal.Width[iris$Species == "versicolor"],
breaks = iris_breaks,
ylim = iris_ylim,
xlab = iris_xlab,
main = "versicolor")
grid()
hist(iris$Sepal.Width[iris$Species == "virginica"],
breaks = iris_breaks,
ylim = iris_ylim,
xlab = iris_xlab,
main = "virginica")
grid()
par(mfrow = c(1, 1))
The structure consists of three repetitions of the hist()
function, each
followed by a call to the grid()
function.
The arguments passed to the hist()
function only differ in one detail: the
name of the species (setosa, versicolor or virginica).
We can make the code more legible by turning the repeating parts of the code
into a function, say hist_panel()
.
We’ll treat the species name as a function parameter called species_name
.
Following the general syntax for a function definition, we write:
function(species_name) {
hist_panel <-# Function body
}
For the function body, we first copy one instance from our previous code that
contains the repeating combination of the hist()
and grid()
functions.
Then we replace the explicit name of the species (e.g. "setosa"
) by the
parameter species_name
.
function(species_name) {
hist_panel <-
# Show histogram of sepal width distribution for a given species
hist(iris$Sepal.Width[iris$Species == species_name],
breaks = iris_breaks,
ylim = iris_ylim,
xlab = iris_xlab,
main = species_name)
grid()
}
Now we can replace the repeating code blocks by calls to the function
hist_panel()
.
par(mfrow = c(3, 1))
seq(1.8, 4.6, 0.1)
iris_breaks <- c(0, 12)
iris_ylim <- "Sepal width (cm)"
iris_xlab <-hist_panel("setosa")
hist_panel("versicolor")
hist_panel("virginica")
par(mfrow = c(1, 1))
The new code is much shorter, but we can make it even better.
We don’t need to repeat the function name hist_panel
on three consecutive
lines.
Setosa, versicolor and virginica are the three species that occur in the
column iris$Species
, so we can call hist_panel()
with a for
-loop that
iterates over the unique values in iris$Species
.
par(mfrow = c(3, 1))
seq(1.8, 4.6, 0.1)
iris_breaks <- c(0, 12)
iris_ylim <- "Sepal width (cm)"
iris_xlab <-for (species in unique(iris$Species)) {
hist_panel(species)
}
par(mfrow = c(1, 1))
The version with the for
-loop makes it clearer how we choose the species to
be included in the plot.
Consequently, this version comes closer to our aim of writing self-documenting
code.
In summary, we learned how to write our own functions in R.
Functions are defined with the following syntax.
function(param1, param2, ...) { function_name <-# Body of the function: # commands that do something with param1, param2, ... # last evaluated object is returned as function value }
The objects in the parentheses (
param1
,param2
, …) are called parameters.When a function is called with concrete arguments, R replaces the parameters in the function body by the values of the arguments.
Functions are mainly used to replace repeating code sequences. The resulting code is usually shorter, more readable and easier to maintain.
With this tutorial, our R lessons are drawing to a close.
We’ve come a long way from simple vector operations to more complex scripts
with for
-loops and functions.
There are many more R features that are worth exploring.
For example, R can make geographic maps, animations and web apps.
It’s even possible to write complete books with R.
I hope R has piqued your interest.
Thank you for watching these tutorials.
Goodbye!