Chapter 12 Bar Charts
Welcome back to Quantitative Reasoning!
In the previous tutorial, we learned how to produce numeric tables with R.
Looking at numbers is fine, but sometimes it’s more engaging to visualise the
results.
As our textbook points out, bar charts are a good general-purpose method to
visualise categorical data.
Bar charts are very easy to produce with R.
We simply pass the result of the table()
function as argument to
the barplot()
function.
Last time, we created a one-way table of Titanic travellers with the table()
function.
table(titanic$class)
##
## 1st 2nd 3rd Crew
## 286 271 709 942
Let’s first assign the result of our one-way table to a variable, say
one_way
.
Let’s run this assignment so that one_way
appears in the Environment tab.
table(titanic$class) one_way <-
Next we use the barplot()
function with the argument one_way
.
barplot(one_way)
When we run this command, the plot appears in the bottom right pane. By adjusting the pane size, we can stretch or shrink the plot until it looks pleasant.
We can also make a bar chart of a two-way table.
Let’s assign the result of the two-way table from the previous tutorial to a
variable called two_way
.
First we run this assignment.
table(titanic$class, titanic$survived) two_way <-
Then we run barplot(two_way)
.
barplot(two_way)
We now get a segmented bar chart. That is, we get two bars: one bar for
FALSE
(i.e. the passenger died) and another bar for TRUE
(i.e. the
passenger survived).
These two bars are subdivided by class.
Unfortunately, it isn’t clear from this plot which colour corresponds to which
class.
But we can easily fix this problem.
We simply add another argument to the barplot()
command:
legend.text = TRUE
.
barplot(two_way, legend.text = TRUE)
Depending on the size of the pane, the legend may overlap with the bars.
Usually this problem goes away by making the pane sufficiently tall and
running the barplot()
command again.
In this plot, we have two bars, each divided into four segments.
Suppose we want the opposite.
That is, four bars with two segments.
Here is what I have in mind.
There’s one bar for each class, each divided into segments that show how many
individuals in this class survived or died.
This plot is neither better nor worse than the plot we’ve already produced.
Both of them contain in fact the same numeric information.
But the plots emphasize different aspects of the data.
From our earlier plot, we can easily read off the total number of survivors and
the total number of people who died, simply by looking at the length of the two
bars.
In our new plot, we’ll be able to see more easily how many 1st-, 2nd-,
3rd-class passengers and crew members were on board.
To produce the new plot, we need to flip the rows and columns in our variable
two_way
.
The easiest way to make the change is to swap the order of the two arguments
in the table()
function so that survived
comes before class
.
Let’s assign the result to a new variable another_two_way
.
Then we run barplot()
again with the new variable.
You don’t need to memorise in which order the arguments must be to produce one or the other segmented bar chart. Trial-and-error is just fine.
The default position of the legend is quite unfortunate in our second bar plot.
The legend overlaps with one of the bars.
In this case, resizing the pane doesn’t solve the problem.
So we need a more creative solution.
We could read through the help document at ?barplot
, but in this case
the built-in documentation isn’t very beginner-friendly.
Instead let’s search the World Wide Web for ways to move the legend to the
top left corner of the plot.
I’m googling the terms “legend position in barplot” and “R”.
The first returned web site looks promising:
https://r.789695.n4.nabble.com/legend-position-in-quot-barplot-quot-td3391490.html.
Somebody asked:
“Is there a way that I can control the position of the legend while using the
barplot()
function?”
And the answer was: “Yes”.
In the answer, there’s an argument args.legend = list(x="topleft")
.
It looks complicated, and the details are indeed beyond our current knowledge,
but let’s copy and paste it, and we keep our fingers crossed.
barplot(another_two_way, legend.text = TRUE, args.legend = list(x="topleft"))
Great, this command works! The lesson to learn from this experience is that searching the web often brings up useful snippets of R code. I admit that, at the beginner’s level, it will need some patience to find the right tricks online. But you’ll get better with practice.
From our current bar chart, it’s easy to tell from the total length of each bar
how many travellers were in each class, but it isn’t immediately clear what the
proportions of travellers are who died and survived in each class.
If we want to emphasize this feature more strongly, we can look at a dodged
bar chart, which our book calls a “side-by-side bar chart”.
We add the argument beside = TRUE
to our previous command and insert a few
line breaks to avoid running past the recommended 80-character line length.
barplot(another_two_way,
legend.text = TRUE,
args.legend = list(x="topleft"),
beside = TRUE)
There are many more embellishments we may still want to add to our bar charts. For example, we can add a label to the vertical axis or change the colours. We learn how to do this later in this course. But for now, let’s summarize what we learned in this tutorial.
- We can pass the result of the
table()
function as an argument to thebarplot()
function to visualise categorical data. barplot()
accepts one-way and two-way tables as input.- If the input is a two-way table, the bars are segmented.
- If we add the argument
beside = TRUE
to a segmented bar chart, the result is a side-by-side bar chart.
Bar charts are an excellent general-purpose method to summarise counts of categorical data such as passenger class or survival. Next time we learn how to summarize quantitative data.
See you soon.