Chapter 12 Bar Charts

Welcome back to Quantitative Reasoning! In the previous tutorial, we learned how to produce numeric tables with R. Looking at numbers is fine, but sometimes it’s more engaging to visualise the results. As our textbook points out, bar charts are a good general-purpose method to visualise categorical data. Bar charts are very easy to produce with R. We simply pass the result of the table() function as argument to the barplot() function.

Last time, we created a one-way table of Titanic travellers with the table() function.

table(titanic$class)
## 
##  1st  2nd  3rd Crew 
##  286  271  709  942

Let’s first assign the result of our one-way table to a variable, say one_way. Let’s run this assignment so that one_way appears in the Environment tab.

one_way <- table(titanic$class)

Next we use the barplot() function with the argument one_way.

barplot(one_way)

When we run this command, the plot appears in the bottom right pane. By adjusting the pane size, we can stretch or shrink the plot until it looks pleasant.

We can also make a bar chart of a two-way table. Let’s assign the result of the two-way table from the previous tutorial to a variable called two_way. First we run this assignment.

two_way <- table(titanic$class, titanic$survived)

Then we run barplot(two_way).

barplot(two_way)

We now get a segmented bar chart. That is, we get two bars: one bar for FALSE (i.e. the passenger died) and another bar for TRUE (i.e. the passenger survived). These two bars are subdivided by class.

Unfortunately, it isn’t clear from this plot which colour corresponds to which class. But we can easily fix this problem. We simply add another argument to the barplot() command: legend.text = TRUE.

barplot(two_way, legend.text = TRUE)

Depending on the size of the pane, the legend may overlap with the bars. Usually this problem goes away by making the pane sufficiently tall and running the barplot() command again. In this plot, we have two bars, each divided into four segments. Suppose we want the opposite. That is, four bars with two segments. Here is what I have in mind.

There’s one bar for each class, each divided into segments that show how many individuals in this class survived or died. This plot is neither better nor worse than the plot we’ve already produced. Both of them contain in fact the same numeric information. But the plots emphasize different aspects of the data. From our earlier plot, we can easily read off the total number of survivors and the total number of people who died, simply by looking at the length of the two bars. In our new plot, we’ll be able to see more easily how many 1st-, 2nd-, 3rd-class passengers and crew members were on board. To produce the new plot, we need to flip the rows and columns in our variable two_way. The easiest way to make the change is to swap the order of the two arguments in the table() function so that survived comes before class. Let’s assign the result to a new variable another_two_way. Then we run barplot() again with the new variable.

You don’t need to memorise in which order the arguments must be to produce one or the other segmented bar chart. Trial-and-error is just fine.

The default position of the legend is quite unfortunate in our second bar plot. The legend overlaps with one of the bars. In this case, resizing the pane doesn’t solve the problem. So we need a more creative solution. We could read through the help document at ?barplot, but in this case the built-in documentation isn’t very beginner-friendly. Instead let’s search the World Wide Web for ways to move the legend to the top left corner of the plot. I’m googling the terms “legend position in barplot” and “R”. The first returned web site looks promising: https://r.789695.n4.nabble.com/legend-position-in-quot-barplot-quot-td3391490.html. Somebody asked: “Is there a way that I can control the position of the legend while using the barplot() function?” And the answer was: “Yes”. In the answer, there’s an argument args.legend = list(x="topleft"). It looks complicated, and the details are indeed beyond our current knowledge, but let’s copy and paste it, and we keep our fingers crossed.

barplot(another_two_way, legend.text = TRUE, args.legend = list(x="topleft"))

Great, this command works! The lesson to learn from this experience is that searching the web often brings up useful snippets of R code. I admit that, at the beginner’s level, it will need some patience to find the right tricks online. But you’ll get better with practice.

From our current bar chart, it’s easy to tell from the total length of each bar how many travellers were in each class, but it isn’t immediately clear what the proportions of travellers are who died and survived in each class. If we want to emphasize this feature more strongly, we can look at a dodged bar chart, which our book calls a “side-by-side bar chart”. We add the argument beside = TRUE to our previous command and insert a few line breaks to avoid running past the recommended 80-character line length.

barplot(another_two_way,
        legend.text = TRUE,
        args.legend = list(x="topleft"),
        beside = TRUE)

There are many more embellishments we may still want to add to our bar charts. For example, we can add a label to the vertical axis or change the colours. We learn how to do this later in this course. But for now, let’s summarize what we learned in this tutorial.

  • We can pass the result of the table() function as an argument to the barplot() function to visualise categorical data.
  • barplot() accepts one-way and two-way tables as input.
  • If the input is a two-way table, the bars are segmented.
  • If we add the argument beside = TRUE to a segmented bar chart, the result is a side-by-side bar chart.

Bar charts are an excellent general-purpose method to summarise counts of categorical data such as passenger class or survival. Next time we learn how to summarize quantitative data.

See you soon.