BW #46: Pedestrians
Are pedestrian deaths really rising in America, when they're declining in other countries? This week, we look at some of the data regarding traffic accidents and pedestrians.
I love to walk. I take a long walk at dawn every morning, passing through several local parks. When I travel, I enjoy exploring other cities and countries by walking. Sure, I could get places faster in other ways, but walking allows you to take in the local scenery, as well as really get a good sense of the place. When I did some corporate training in San Jose, California, my client was rather surprised to hear that I hadn't rented a car, but that I was rather planning to walk 30 minutes from my hotel to their office each morning. (I think that I was literally the only person who walked to work in Silicon Valley that week.)
I was thus surprised and disappointed to hear several variations on the same news story recently, all pointing to the fact that pedestrian deaths in the United States have been rising over the last few years — at the same time as they have been falling in other countries. This was more prominently mentioned in the New York Times (https://www.nytimes.com/interactive/2023/12/11/upshot/nighttime-deaths.html?unlocked_article_code=1.JE0.6nDD.DwDlgNGWvbej&smid=url-share), but it was also mentioned on Slate's Political Gabfest (https://slate.com/podcasts/political-gabfest/2023/07/extreme-weather-heat-and-floods-are-killing-us-political-gabfest), and was previously the subject of a Vox article (https://www.vox.com/23784549/pedestrian-deaths-traffic-safety-fatalities-governors-association).
One reason for the higher fatalities, according to this analysis, is the growth of "stroads." To be honest, I never really thought about the difference between a "street" and a "road," but it seems that urban planners see them as totally different: Streets are where people do things with one another, including buying things and congregate. Road, by contrast, are where cars can go at higher speeds, unimpeded by too many pedestrians. The term "stroad" (https://en.wikipedia.org/wiki/Stroad) refers to something that has the locations people want or need to visit in person, such as supermarkets and movie theaters, but that have little or no space for pedestrians. The stories basically say that people who walk along these stroads, especially at night, are at high risk for being injured or killed, and that other countries are more likely to make commercial areas accessible on foot.
For our last data set of 2023, we'll look at traffic accidents involving pedestrians in the United States. (I know, just the sort of happy topic you want to think about at the conclusion of the year, right?) We'll look at the conditions under which these accidents take place.
Data and six questions
Accidents are collected and reported as part of the "Fatality Analysis Reporting System" (FARS), part of the National Highway Traffic Safety Administration, which is itself part of the US Department of Transportation:
https://www.nhtsa.gov/research-data/fatality-analysis-reporting-system-fars
The data is all available from the following site:
https://www.nhtsa.gov/content/nhtsa-ftp/251
If you go to that site, you'll see a folder for each year FARS data, starting in 1975, and going through 2021, the latest year for which data has been publicized. Inside of each year's folder are two sub-folders, one called "National" and the other "Puerto Rico." We'll look at the National data; if you go inside of that sub-folder, you'll see four files -- two with data in CSV format, and two with the data in SAS format. We'll look at the original (not auxiliary) CSV data, which means that if you're looking for data from 2021, you'll need to download:
https://www.nhtsa.gov/file-downloads?p=nhtsa/downloads/FARS/2021/National/FARS2021NationalCSV.zip
(Notice that the year appears twice in the URL, once as a folder and once as a filename.)
If you unpack the zipfile, you'll find a large number of CSV files, each with a different slice of the full FARS data. We'll be interested in two files from each year, `accident.csv` (which lists all of the accidents that took place during that year) and `person.csv` (which lists all of the people involved in the accidents). Actually, those filenames aren't quite right; the filename and suffix are sometimes in ALL CAPS, sometimes Capitalized, and sometimes in all lowercase letters. Plus, sometimes the directory is and isn't there. What fun!
The data dictionary that describes most (but not all!) of the columns in the data we'll be looking at is here:
https://crashstats.nhtsa.dot.gov/Api/Public/ViewPublication/813426
This data dictionary is more than 1,000 pages long (!), so you'll almost certainly want to search through it, rather than reading through the whole thing. I'll point you to the particular columns that we'll be analyzing.
This week, I have six questions and tasks for you. This week's learning goals include:
- Creating a data frame from multiple files
- Using the requests, zip, and BytesIO Python modules
- Grouping and pivot tables
- Joining data frames together for queries
- Creating plots to visualize data
Here are this week's six questions:
- Create two data frames from the FARS data in 2021, one for accidents and one for people. Use the `requests` package (https://docs.python-requests.org/en/latest/index.html) plus the `zip` and `BytesIO` modules in Python's standard library to retrieve and process these files, turning them into a data frame.
- When you have that working, now create two data frames, `accident_df` and `person_df`, based on all of the data from 2010 through 2021. You'll need to:
- Download the zipfile for CSV data from each year
- From that file, extract the files `accident.csv` and `person.csv`, no matter its capitalization or whether it creates a subdirectory.
- Read each CSV file into a data frame for that year.
- Merge all of the annual data frames into `accident_df` (from the accident files) and `person_df` (from the person files).