BW #71: Holidays

BW #71: Holidays

[Administrative note: From what I can tell, the migration to Bamboo Weekly's new platform (on Ghost) was a success. There are a few things that I'm going to fix up, but I'm overall pretty happy with how smooth it was, and appreciate your patience as the dust settles on our new home. Please let me know if you see any problems or bugs with this system!]

Holidays are a funny thing: They're obvious and important to the people who celebrate them, but are somewhat invisible (and surprising) to the people who celebrate them. On my first trip to China, I was told that the day following my flight home would be the Dragon Boat festival, and that I should leave early for the airport, to avoid terrible traffic. My combined last-minute approach to arriving at airports, combined with my ignorance of the importance of Dragon Boat, led to me almost missing my flight.

Depending on where you live, then, it might be super obvious that last week was the Jewish festival of Shavuot. Or that earlier this week began the Muslim festival of Eid al-Adha. Or that today is Juneteenth, the most recent official American holiday.

Given the number of holidays I've seen and heard about in just the last week, today's Bamboo Weekly is all about holidays. Specifically, we'll create a data frame with holidays from around the world, based on a database on PyPI, and then we'll run lots of queries about it.

Data and six questions

The original data comes from the holidays package on PyPI, which lets you retrieve the holidays for any country in any year range. In order to query this data for all countries, you'll want to download the pycountry package as well, using its 2-character country codes to make your queries of the holidays package.

I have six tasks and questions for you this week. The learning goals include creating a data frame from Python data, string handling, date handling, grouping, and joins.

I'll be back tomorrow, as usual, with my complete solutions and Jupyter notebook:

  1. Create a data frame with four columns (country name, alpha2, date, and holiday name) for all countries, from the years 2010 through 2024. Use the pycountry module (from PyPI) to go through all of the countries in the world, and the holidays module (also from PyPI) to grab all of the holidays from there. The dates
    should be in a datetime column.
  2. Which countries have holidays in June 2024? Which of this month's holidays, if any, are celebrated in more than one country? Do we see any issues that might result in a mis-count?