Skip to content

BW #68: Dangerously hot weather

Get better with: Extracting from PDF files, multi-indexes, comprehensions, cleaning data, regular expressions, using "apply", plotting, and finding correlations.

BW #68: Dangerously hot weather

Summer is starting, at least in the northern hemisphere. I keep hearing people say that it's surprisingly hot out. That's certainly true in Israel, where we expect hot weather during the summer (and often during the spring and fall, too), but the rest of the world is also experiencing unusually hot summers. Just this week, the New York Times reported that heat-related deaths are an increasingly big problem – for workers, for employers who want to keep them safe on the job, and the government, which wants to ensure a safe workplace. (You can read the article here: https://www.nytimes.com/2024/05/25/climate/extreme-heat-biden-workplace.html?unlocked_article_code=1.vk0.ekJ8.Kg0h3dcMGz9d&smid=url-share )

The article cited a number of sources, one of which was from the National Weather Service (https://www.weather.gov/), which publishes statistics about various weather-related disasters and hazards:

https://www.weather.gov/hazstat/

As we start to enjoy (or not!) warmer weather, I thought it might be interesting to dig into this data, to see if heat-related fatalities are really increasing -- and if so, by how much.

Data and seven questions

On the National Weather Service's hazards list, there is a link to download the 80-year summary of all weather-related fatalities in the United States:

https://www.weather.gov/media/hazstat/80years_2023.pdf

As you can see from the file extension, it's a PDF file. You'll want to use the Tabula-py (https://tabula-py.readthedocs.io/en/latest/) package to read this into Pandas.

This week, I have seven tasks and questions for you to answer; I'll be back tomorrow with my solutions and explanations. The learning goals for this week include working with PDF files, nullable dtypes, plotting, and correlations.

The questions: