Chapter 11 Tables

Welcome back to Quantitative Reasoning! In the previous tutorials, we learned how to use logical vectors. With logical vectors we can find out, for example, how many passengers on the Titanic travelled in the third class. We can find the answer with the same strategy that we learned last time. We pass the logical vector titanic$class == "3rd" as argument to the sum() function.

sum(titanic$class == "3rd")
## [1] 709

Suppose we also want to find out the corresponding numbers for all other classes. In principle, we can replace "3rd" in the previous command by all unique values in the class column. That is, we first run unique(titanic$class)—a command we already saw in tutorial 06—and then we replace "3rd" by all other elements in the output vector.

unique(titanic$class)
## [1] "3rd"  "Crew" "2nd"  "1st"

Unfortunately, this approach needs a lot of repeated code. We would have to replace "3rd" by "Crew”, "2nd" and "1st". And we must be careful that we don`t make a typo in any of the class categories.

In this tutorial, we’ll learn a much more convenient method to summarise this information: the table() function. Thanks to the table() function, we can easily construct two types of tables that are presented in chapter 2 of our textbook: frequency tables and contingency tables.

In our example, here is how we find out with a single function call how many people are in each category of the class column.

table(titanic$class)  # Frequency table
## 
##  1st  2nd  3rd Crew 
##  286  271  709  942

We can see in compact form that there were 286 first-class passengers, 271 second-class passengers and so on. If we want the relative frequencies instead of the absolute numbers, we simply divide by the total number of people on the Titanic. That is, we divide by nrow(titanic).

table(titanic$class) / nrow(titanic)  # Relative frequency table
## 
##       1st       2nd       3rd      Crew 
## 0.1295290 0.1227355 0.3211051 0.4266304

Now we can easily see that around 13% of the individuals on board were first-class passengers, 12% second-class passengers and so on.

Both of the tables we’ve just produced are called one-way tables because they split the observations by a single category, in our case the unique values in the column class. But we can also split the observations simultaneously by two categories. For example,

table(titanic$class, titanic$survived)  # Contingency table
##       
##        FALSE TRUE
##   1st    113  173
##   2nd    154  117
##   3rd    528  181
##   Crew   701  241

shows how people on board are split by class and survival. Because the table shows how the number of individuals belonging to one category is contingent on the other category, such a table is called a contingency table. Another term for it is two-way table because the numbers are tallied according to two criteria, in our example the categories “class” and “survival”.

If we want to see the fraction of cases in each table cell, we can divide the numbers in the two-way table by nrow(titanic).

table(titanic$class, titanic$survived)  / nrow(titanic)
##       
##             FALSE       TRUE
##   1st  0.05117754 0.07835145
##   2nd  0.06974638 0.05298913
##   3rd  0.23913043 0.08197464
##   Crew 0.31748188 0.10914855

The output shows that, for example, approximately 5.1% of all people on board were in the first class and died, 7.8% were in the first class and survived. And so on.

By the way, if we compare our numbers with Table 2.4 in our textbook, we notice that they differ slightly. The reason is that the numbers come from different sources. The textbook’s data are from a 1990 report by the British Board of Trade.1 The data I supplied to you is from the Encyclopedia Titanica.2 It’s difficult to judge why the data differ. Both sources appear trustworthy. Encyclopedia Titanica is probably more up-to-date thanks to recent contributions by professional and amateur historians. The lesson to learn is that we must always clearly state our source so that others can replicate our data analysis.

To summarise the main point, we have seen how to create one-way and two-way tables with the table() function. In the next tutorial, we learn how to visualise data as a bar chart.

See you soon.


  1. British Board of Trade (1990), Report on the Loss of the ‘Titanic’ (S.S.). British Board of Trade Inquiry Report (reprint). Gloucester, UK: Allan Sutton Publishing.↩︎

  2. https://www.encyclopedia-titanica.org↩︎