Chapter 17 Exporting a Data Frame to CSV

Welcome back to Quantitative Reasoning! In tutorial 07, we worked with the titanic data frame and converted ticket prices from pounds-shillings-pence to decimal notation. In essence, here is what we did back then. We imported the data from a CSV and added a new column called price.

titanic <- read.csv("~/QR/titanic/titanic.csv")
titanic$price <- titanic$pnd + titanic$shl / 20 + titanic$pnc / 240

We can see in RStudio’s spreadsheet viewer that the price column was added on the right. Suppose our colleagues need these additional data for their own work. How can we share a spreadsheet with them that contains this additional column?

One option is to export these data to a CSV file and share the file with our colleagues. Even if they aren’t familiar with R, they will still be able to perform their own analysis (e.g. with Excel or Python). In R, we can export a data frame to a file with the function write.csv(). Inside the parentheses, we must specify at least two arguments. The first argument is the name of the data frame we wish to export. The second argument is the name of the exported file, say titanic_with_price.csv.

write.csv(titanic, file = "titanic_with_price.csv")

In the Files tab in the bottom right pane, we now find the new CSV file. Let’s open it with Excel to take a peek. If Excel is your default spreadsheet program, you can click on “More” to the right of the cogwheel symbol. Then select “Show Folder in New Window” and double-click on titanic_with_price.csv. We can indeed find the price column on the right. And it isn’t the only new column. Compared to titanic.csv, there’s now also a new column on the left that contains the numbers “1”, “2”, “3” etc. This additional column may not cause any harm, but it may cause confusion if we import the new CSV into R and rely on the old column numbers to store the same information as the new column numbers, which are now off by one.

The recommended solution is to add one more argument to write.csv: row.names = FALSE.

write.csv(titanic, file = "titanic_with_price.csv", row.names = FALSE)

Let’s run this command and reopen the file with Excel. Now there isn’t any column with integer identifiers on the left any longer.

Great! We now know how to share the results of an R analysis with somebody else. But, looking at the Excel spreadsheet, we may be wondering what all these cells with “NA” are supposed to mean. If you’re curious, then watch the next tutorial.

In summary, we export a data frame to a CSV file with the write.csv() function. We pass the name of the data frame as the first argument and the file name as the second argument. It’s good practice to add a third argument row.names = FALSE.

Next time we lift the secret behind NA.

See you soon.