BW #48: Aviation accidents
Last week's airplane accidents remind us that aviation can be dangerous. But how dangerous is it, and hasn't air travel gotten much safer over the years? This week, we calm ourselves down with data.
Twice in the past week, airline accidents have made the news: First, a Japanese flight landed on a runway that was mistakenly occupied by another plane (https://www.msn.com/en-us/news/us/japan-coast-guard-plane-not-cleared-for-takeoff-before-deadly-runway-crash-air-traffic-control-transcript-suggests/ar-AA1mq1em). Then, several days ago, an Alaska Airlines flight took off, only to have part of the airplane's wall break off, leaving a big hole (https://www.msn.com/en-us/travel/news/alaska-airlines-passenger-describes-terrifying-flight-to-california-there-was-a-hole-in-the-plane/ar-AA1mJDHZ).
These incidents were remarkable not only because no one on either plane was killed or even seriously injured. (The crew of the Coast Guard plane in Japan weren't so lucky, unfortunately.) It's also amazing, when you think about it, just how safe air travel is given the number of flights in the sky. That's due, in no small part, to the amount of redundancy that goes into all of the technology, as well as checks and double-checks that happens on a regular basis.
Of course, sometimes things can go wrong, as Tim Harford describes in this episode of his "Cautionary Tales" podcast: https://timharford.com/2022/09/cautionary-tales-when-the-plane-ran-out-of-fuel/
Even though accidents, let alone disasters, have become increasingly rare over the last few decades, they are far from unknown. In the US, the National Transportation Safety Board (NTSB, at https://www.ntsb.gov/Pages/home.aspx) investigates incidents and then issues recommendations to ensure that problems don't repeat themselves.
I've heard that air travel has become significantly safer in the last few decades, but is that true? In the wake of last week's accidents, I decided that we should take a look and find out for ourselves.
Data and seven questions
This week's data comes from the NTSB's "CAROL" database (https://data.ntsb.gov/carol-main-public/basic-search), which lets you query all sorts of transportation incidents. You can then download the result of your query in either JSON or CSV format.
This week, I want to look at data on aviation accidents from January 1st, 1998 through today. This will require going to the CAROL search site, choosing "Aviation" as the mode, and then downloading five separate JSON files. (That's necessary because the CAROL site won't let you download more than 10,000 records at a time.) I chunked the data into these date ranges:
- 01/01/2018 to 12/31/2024
- 01/01/2012 to 12/31/2017
- 01/01/2007 to 12/31/2011
- 01/01/2002 to 12/31/2006
- 01/01/1998 to 12/31/2001
(Notice that the dates are in month/day/year format, since the NTSB is based in the US.)
Downloading the JSON actually gave me a zipfile that, when opened, contained a file in JSON format, as well as a (not very useful) "readme" file describing the query used to download the data.
This week, I have seven tasks and questions for you. The learning goals for this week include working with JSON files, handling complex data inside of a column, grouping, unstacking, plotting, and working with datetime data. I'll share my detailed solutions, including the Jupyter notebook I used to answer the questions, tomorrow.
Here, then, are today's questions:
- Combine the five JSON files into a single data frame. You will only need the following columns: `cm_mkey`, `cm_eventDate`, `cm_vehicles`, `cm_fatalInjuryCount`, `cm_seriousInjuryCount`, and `cm_minorInjuryCount`. Treat the `cm_eventDate` column as a date. Set the `cm_mkey` column to be the index. How can you be sure that you downloaded all of the files, covering all years?
- Count the number of vehicles involved in each incident. How often does each count occur?