BW #104: Aviation accidents
Get better at: Dates and times, memory optimization, plotting, and string optimization
![BW #104: Aviation accidents](/content/images/size/w1200/2025/02/DALL-E-2025-02-06-01.05.04---Realistic-panda-bears-working-as-air-traffic-controllers-in-the-tower-of-an-airport.-They-are-wearing-aviation-headsets-and-professional-uniforms--mon.webp)
It has been just over a week since a passenger plane and army helicopter collided in Washington, DC, resulting in 67 deaths. (Here's a Washington Post article with some updates: https://www.washingtonpost.com/dc-md-va/2025/02/04/dca-crash-victims-recovered/) The crash was shocking and tragic, of course – but commentators stressed that air travel remains extremely safe, and that these incidents are less common than they used to be.
So this week, we'll look at aviation accidents, to see when and where they take place, and if the numbers have indeed changed over time. Note that these are the problems that arose; we don't know the proportion of all flights that they represent. But we can at learn something about the incidents that are reports – and of course, lots about data analysis with Pandas, too.
Data and six questions
Our data comes from the National Transportation Safety Board (NTSB), which investigated transportation problems. I got the data in CSV format by going to the "aviation investigation search" page (https://www.ntsb.gov/Pages/AviationQueryV2.aspx).
I decided to cast the widest possible net, and thus submitted my query without narrowing down any of the fields. That brought me to a results page; at the bottom of the page were buttons to download the data in JSON and CSV format. I chose CSV.
I have six tasks and questions for you this week. Our learning goals this week include:
- Memory optimization (https://www.bambooweekly.com/tag/memory-optimization/)
- Working with dates and times (https://www.bambooweekly.com/tag/datetime/)
- Plotting (https://www.bambooweekly.com/tag/plotting/)
- String manipulation (https://www.bambooweekly.com/tag/strings/)
A link to the data file is at the bottom of this post. I'll be back tomorrow with my complete solutions, including (for paid subscribers) the Jupyter notebook I used to solve the problems, in both downloadable format and in a one-click format that opens in Google Colab.
Here are my six questions:
- Create a data frame from the CSV file. Make sure that
EventDate
is treated as adatetime
value. How much memory can you save by adjusting the dtypes from the defaults? - Display a bar graph showing, for each 5-year period in our data set, the number of flights in the United States with fatalities. Do we see a trend?