BW #106: Flu season

Get better at: String manipulation, working with dates, plotting, grouping, and pivot tables

BW #106: Flu season

We'll be holding monthly Pandas office hours on Sunday, February 23rd! If you're a paid subscriber, then you can join me then, asking any Pandas-related questions you have. Details will be sent on Friday. I hope that you'll join me there!

It's winter in the northern hemisphere, which means that it's also flu season. At nearly all of my recent meetings and corporate training classes, some participants were out sick, or were taking care of sick children. I've heard of several cases when so many students and teachers were sick that the entire class was told to stay home.

It's not just flu, either: Plenty of other viruses have come out to play this winter, including RSV (Respiratory syncytial virus).

And while we aren't talking about it this week, avian flu is causing a lot of trouble, and there are worries it might cause another pandemic (https://www.nytimes.com/2025/02/13/podcasts/the-daily/bird-flu-eggs.html?unlocked_article_code=1.yE4.cGJh.rHxFPrPOzeNg&smid=url-share).

The World Health Organization (WHO, https://who.int), whose FluID data set tracks flu and flu-related infections (including RSV) around the world. Our questions this week will examine this data, looking at how many cases have been found, which countries have the most cases, and how common RSV is compared with the flu.

Data and six questions

This week's data comes from the FluID program at the WHO:

https://www.who.int/teams/global-influenza-programme/surveillance-and-monitoring/influenza-surveillance-outputs)

You can download the data from the link marked "Download the fluID dataset (CSV)":

https://xmart-api-public.who.int/FLUMART/VIW_FID?$format=csv

The data dictionary is available, also in CSV format, from a link on the same page:

https://xmart-api-public.who.int/FLUMART/VIW_FLU_METADATA?$format=csv

The learning goals for this week include:

I'll be back tomorrow with my solutions, including (for paid subscribers) a downloadable copy of the Jupyter notebook I used to solve these problems, as well as a one-click link to open that notebook (and the data) in Google Colab.

Here are my six questions:

  • Create the data frame. Make sure that the ISO_WEEKSTARTDATE column is a datetime.
  • In each quarter of our data set, starting in 2020, which country had the greatest number of deaths from flu and flu-related viruses? Ignore weeks in which the maximum is 0.