BW #53: Airport animals
How many animals entered Heathrow Airport in 2023? What do people bring into the airport? And how has this changed over the years?
I recently read a story in the Economist ("How to transport a rhino," https://www.economist.com/britain/2024/01/25/how-to-transport-a-rhino) about the various animals that are transported into and out of London's Heathrow Airport every year. There is, it turns out, a special place (HARC, the Heathrow Animal Reception Centre), run by the City of London, which employs 55 people and handles the import and transit of a wide variety of animals — from small insects to lions and horses.
The article was amusing, and led me to wonder where they had gotten their data. After all, the article said that more than 30 million butterfly pupae were transported through Heathrow in 2023. That number had to come from somewhere, right?
Friends, I'm delighted to say that I have managed to track down that data! Perhaps it also exists elsewhere, but I found it in a letter submitted by HARC to the Chair of the Environment, Food, and Rural Affairs Committee of the UK's House of Commons. The letter was submitted on November 1st of last year, so it isn't completely up to date. But unless your animal's passport is out of date – and yes, I've learned that there is such a thing as a "pet passport" — the data should still be interesting and fun to review.
Data and seven questions
The data that we'll be examining isn't available in either CSV or Excel format. Rather, it's buried inside of a letter in PDF format. The letter can be downloaded from here:
https://committees.parliament.uk/writtenevidence/126507/default/
And no, there is no filename or extension on that URL. Going to that link should force the download of the data, at least from a normal browser. Using `wget` doesn't seem to work, however.
I didn’t see a data dictionary for this information, but I think that it’s mostly self-explanatory. I did look up some of the animal-related terms, and will happily bore you with the details, if you like.
This week, I have seven tasks and questions for you to answer based on the data.
The learning goals for this week include working with PDF files, indexes and multi-indexes, and cleaning data.
I’ll be back tomorrow with detailed solutions to all of the questions, along with the Jupyter notebook I created in solving them.
- Turn the table n page 3 of the PDF into a data frame. I used `tabula-py` (a wrapper around the `tabula-java` package written in Java), available on PyPI (https://pypi.org/project/tabula-py/). I also used JPype1 (https://pypi.org/project/JPype1/), which improved the Python-to-Java communication.
- The final column was mis-parsed, at least on my system, such that it contains information for both consignments and animals from 2023, separated by spaces. Replace this one column with two columns.