Chapter 11 Tables
Welcome back to Quantitative Reasoning!
In the previous tutorials, we learned how to use logical vectors.
With logical vectors we can find out, for example, how many passengers on the
Titanic travelled in the third class.
We can find the answer with the same strategy that we learned last time.
We pass the logical vector titanic$class == "3rd"
as argument to the
sum()
function.
sum(titanic$class == "3rd")
## [1] 709
Suppose we also want to find out the corresponding numbers for all other
classes.
In principle, we can replace "3rd"
in the previous command by all unique
values in the class
column.
That is, we first run unique(titanic$class)
—a command we already saw in
tutorial 06—and then we replace "3rd"
by all other elements in the output
vector.
unique(titanic$class)
## [1] "3rd" "Crew" "2nd" "1st"
Unfortunately, this approach needs a lot of repeated code.
We would have to replace "3rd"
by "Crew”
, "2nd"
and "1st"
.
And we must be careful that we don`t make a typo in any of the class
categories.
In this tutorial, we’ll learn a much more convenient method to summarise this
information: the table()
function.
Thanks to the table()
function, we can easily construct two types of tables
that are presented in chapter 2 of our textbook: frequency tables and
contingency tables.
In our example, here is how we find out with a single
function call how many people are in each category of the class
column.
table(titanic$class) # Frequency table
##
## 1st 2nd 3rd Crew
## 286 271 709 942
We can see in compact form that there were 286 first-class passengers, 271
second-class passengers and so on.
If we want the relative frequencies instead of the absolute numbers, we simply
divide by the total number of people on the Titanic.
That is, we divide by nrow(titanic)
.
table(titanic$class) / nrow(titanic) # Relative frequency table
##
## 1st 2nd 3rd Crew
## 0.1295290 0.1227355 0.3211051 0.4266304
Now we can easily see that around 13% of the individuals on board were first-class passengers, 12% second-class passengers and so on.
Both of the tables we’ve just produced are called one-way tables because they
split the observations by a single category, in our case the unique values in
the column class
.
But we can also split the observations simultaneously by two categories.
For example,
table(titanic$class, titanic$survived) # Contingency table
##
## FALSE TRUE
## 1st 113 173
## 2nd 154 117
## 3rd 528 181
## Crew 701 241
shows how people on board are split by class and survival. Because the table shows how the number of individuals belonging to one category is contingent on the other category, such a table is called a contingency table. Another term for it is two-way table because the numbers are tallied according to two criteria, in our example the categories “class” and “survival”.
If we want to see the fraction of cases in each table cell, we can divide the
numbers in the two-way table by nrow(titanic)
.
table(titanic$class, titanic$survived) / nrow(titanic)
##
## FALSE TRUE
## 1st 0.05117754 0.07835145
## 2nd 0.06974638 0.05298913
## 3rd 0.23913043 0.08197464
## Crew 0.31748188 0.10914855
The output shows that, for example, approximately 5.1% of all people on board were in the first class and died, 7.8% were in the first class and survived. And so on.
By the way, if we compare our numbers with Table 2.4 in our textbook, we notice that they differ slightly. The reason is that the numbers come from different sources. The textbook’s data are from a 1990 report by the British Board of Trade.1 The data I supplied to you is from the Encyclopedia Titanica.2 It’s difficult to judge why the data differ. Both sources appear trustworthy. Encyclopedia Titanica is probably more up-to-date thanks to recent contributions by professional and amateur historians. The lesson to learn is that we must always clearly state our source so that others can replicate our data analysis.
To summarise the main point, we have seen how to create one-way and two-way
tables with the table()
function.
In the next tutorial, we learn how to visualise data as a bar chart.
See you soon.
British Board of Trade (1990), Report on the Loss of the ‘Titanic’ (S.S.). British Board of Trade Inquiry Report (reprint). Gloucester, UK: Allan Sutton Publishing.↩︎