BW #62: Economic report card (solution)

This week, we looked at the latest “World Economic Outlook” report from the International Monetary Fund (IMF, https://imf.org) on the world economy, giving both historical data and future projections on a variety of economic topics. They issued a complete report, as well as an executive summary, which were extensively covered by a variety of publications. You can read all about it, including downloading those written reports, here: https://www.imf.org/en/Publications/WEO/Issues/2024/04/16/world-economic-outlook-april-2024.

As data people, the reports aren’t as engaging as the underlying data. Fortunately, the IMF supplies all of the data that they used, allowing us to look at individual countries and also compare across them.

I decided to examine how members of the G20 (19 countries, plus the EU and the African Union — demonstrating that economists are also subject to off-by-one errors) have looked over the last few years. In particular, I wanted to better understand how well these countries have done in the last few years of post-pandemic economic weirdness.

Data and seven questions

The data we used was mainly from the IMF’s download page for their database:

https://www.imf.org/en/Publications/WEO/weo-database/2024/April/download-entire-database

Here are the seven tasks and questions that I gave you. A link to the Jupyter notebook I used to solve the problems is at the end of this post.

It takes 6-8 hours to research and write each edition of Bamboo Weekly. Thanks to those of you who support my work with a paid subscription! Even if you can’t, it would mean a lot if you would share BW with your Python- and Pandas-using colleagues. Thanks!

Download the full database from the IMF. If you're like me, you'll find that you have to jump through a number of hoops in order to get it loaded into Pandas. (I've never seen anything like this before, to be honest, and I'm surprised that the IMF distributed such a weird file.) Create the data frame such that it has a multi-index made up of the "Country" and "Subject Descriptor" columns, and the dtype of every column with a year heading is a float type.

Let’s start with the easiest thing, namely loading up Pandas:

import pandas as pd

Loading up the file shouldn’t be a problem. We just go to the IMF’s site, click on the link, get a file, and we’re off to the races, right?

No. In this case, OMG, definitely not. The “download entire database” page offers the chance to download the file in tab-delimited format. I somehow missed that, downloaded the file, and saw that it had an “xls” extension, meaning Excel. I thus decided to use “read_excel” to read it into Pandas and got an error.

At first, I thought that it was because the version of Excel file they were using wasn’t right for my version of Pandas. I tried a few different Excel drivers, and none of them worked. Different libraries, such as “openpyxl” and “xlrd”, can handle different formats and problems with those formats. In this case, I struck out completely.

Hmm, I thought — maybe it’s really a CSV file, or it’s somehow tab-delimited? I looked at it, and even used the Unix “file” command, and both made it very clear that we’re talking about a binary file that is very much not in CSV format.

I thought that maybe things would be better or easier if I were to copy the link from the site, downloading the file using “wget” or a similar program, rather than my browser. (Because, as you know, browsers are just awful at downloading files from the Web, right?) That file had an “ashx” extension, indicating that it was likely produced on-the-fly from an ASP.NET application. However, it contained precisely the same data.

So now what? I decided that it might be worth loading this weird file into Excel. I got a really weird warning:

That scared me off a bit, so I didn’t open the file. Instead, I renamed it to have an “.xlsx” extension, which is for more modern versions of Excel. However, Excel refused to open that file at all. It didn’t even give me a warning.

So I went back to the downloaded file, opening it with Excel, and clicking “Yes” to the warning dialog box. And whadaya know, it opened just fine. (I tried to upload the document to Google Sheets, it refused to open the file.)

I quickly used “save as” to save the file in Excel format to another filename, and was relieved to find that things then worked:

df = (pd
      .read_excel('new-WEOApr2024all.xls')
)

Well, they almost worked. Loading the data worked just fine; I got a data frame from this command. But the IMF decided to use the two-character string “--” to represent missing values. This meant that every column with numeric data, which should have had a dtype of “np.float64” actually had a dtype of “object”, which means that the values were treated as strings, because there was no way to convert them all to integers or floats.

I passed the “na_values” keyword argument to “read_excel”, telling it that in addition to such strings as “NA” and “NaN”, I wanted “--” to be treated as a NaN value:

df = (pd
      .read_excel('/Users/reuven/Downloads/new-WEOApr2024all.xls', 
                   na_values=['--'])
     )

That worked! Suddenly, all of the columns with years for headings (i.e., the numeric data) had dtypes of “np.float64”, as should be the case.

Finally, I asked you to set the “Country” and “Subject Descriptor” columns as the index. This means creating a multi-index of more than one column. We can always set it later on, but it’s very convenient to do so when we’re reading the file in, with the “set_index” keyword argument:

df = (pd
      .read_excel('/Users/reuven/Downloads/new-WEOApr2024all.xls', 
                   na_values=['--'])
      .set_index(['Country', 'Subject Descriptor'])
     )

We end up with a data frame containing 8,626 rows and 58 columns, and with dtypes that are what we expect and want:

WEO Country Code                  object
ISO                               object
WEO Subject Code                  object
Subject Notes                     object
Units                             object
Scale                             object
Country/Series-specific Notes     object
1980                             float64
1981                             float64
1982                             float64
1983                             float64
1984                             float64
1985                             float64
1986                             float64
1987                             float64
1988                             float64
1989                             float64
1990                             float64
1991                             float64
1992                             float64
1993                             float64
1994                             float64
1995                             float64
1996                             float64
1997                             float64
1998                             float64
1999                             float64
2000                             float64
2001                             float64
2002                             float64
2003                             float64
2004                             float64
2005                             float64
2006                             float64
2007                             float64
2008                             float64
2009                             float64
2010                             float64
2011                             float64
2012                             float64
2013                             float64
2014                             float64
2015                             float64
2016                             float64
2017                             float64
2018                             float64
2019                             float64
2020                             float64
2021                             float64
2022                             float64
2023                             float64
2024                             float64
2025                             float64
2026                             float64
2027                             float64
2028                             float64
2029                             float64
Estimates Start After            float64
dtype: object

Whew! It worked — but it really annoyed me that I spent so much time trying to do something that should be so trivial. Moreover, this required manual intervention, meaning that it’s inappropriate for an automated series of tasks.

Which five countries had the lowest inflation (i.e., "Inflation, average consumer prices" where the units are "Percent change") in 2023? (Is that necessarily good?) Which countries had the highest inflation in 2023?

Inflation is clearly higher than it was before the pandemic started. In many countries, it has gone down quite a bit in the last few years, though. I was thus curious to know where inflation stood across the world — which countries have the lowest inflation, and which have the highest?

The first thing we have to do is select those rows for which the “Subject Descriptor” is “Inflation, average consumer prices.” Normally, that wouldn’t be too hard — except that “Subject Descriptor” is now part of our multi-index, which means that we cannot simply compare its value with what we want.

We also cannot use “.loc” to retrieve the rows that we want, because “Subject Descriptor” is the inner part of our multi-index.

The solution is to use “xs”, which lets us retrieve based on any part (or parts) of a multi-index. We indicate the value that we want to match, as well as the level that we want to match on, specifying it either by number (starting with 0) or by name (if the multi-index components are named):

(
    df
    .xs('Inflation, average consumer prices', 
        level='Subject Descriptor')
)

It turns out, though, that there are two rows in our data file for this measure, each with a different set of units. (Maybe I should have added “Units” as a third dimension in our multi-index!) We thus need to keep only those rows that have “Percent change” for “Units”, which I can do using a combination of “.loc” and “lambda”:

(
    df
    .xs('Inflation, average consumer prices', 
        level='Subject Descriptor')
    .loc[lambda df_: df_['Units'] == 'Percent change']
)

We now have only the rows that are of interest to us. But I was interested in finding out about the year 2023. I can use square brackets to retrieve just that year:

(
    df
    .xs('Inflation, average consumer prices', 
        level='Subject Descriptor')
    .loc[lambda df_: df_['Units'] == 'Percent change']
    [2023]
)

This returns a series. I could then use “sort_values” to get the values, then look at the top and bottom of that sorted series. However, we’ll have a bunch of NaN values in there. So we could first run “dropna” and then “sort_values” — but it’s easier in many ways just to run “nsmallest”:

(
    df
    .xs('Inflation, average consumer prices', 
        level='Subject Descriptor')
    .loc[lambda df_: df_['Units'] == 'Percent change']
    [2023]
    .nsmallest()
)

We get the following results:

Country
Turkmenistan   -1.738
Yemen          -1.225
Seychelles     -1.035
Bahrain         0.075
China           0.228
Name: 2023, dtype: float64

If you’ve been living in an inflationary economy for a few years, this might sound great. Negative inflation (i.e., “deflation”) especially sounds terrific, right? Prices would fall each month, rather than rise.

But as good as deflation sounds, it’s actually pretty awful. Why would you buy something now, if you know that it’ll be cheaper next month, and much cheaper in a year? If everyone puts of buying things then the economy has a real problem getting jump-started.

Central banks want to limit inflation, but they typically don’t want it to go away completely. They keep talking about getting inflation levels to about 2 percent as a good balance, such that prices rise a bit, but not too much.

How about the most inflationary economies? If you’ve been complaining about inflation in your country, then you might well need to sit down before running this query, which uses “nlargest”:

(
    df
    .xs('Inflation, average consumer prices', 
        level='Subject Descriptor')
    .loc[lambda df_: df_['Units'] == 'Percent change']
    [2023]
    .nlargest()
)

The results:

Country
Zimbabwe     667.361
Venezuela    337.458
Sudan        171.471
Argentina    133.489
T¸rkiye       53.859
Name: 2023, dtype: float64

That’s pretty darned high. Turkey — often written as Türkiye, whose “ü” didn’t survive the transformation of file formats — had a very high rate of inflation last year, which many were sure would affect their presidential election, but didn’t. Argentina’s rate is even higher (133 percent!), which almost certainly did help Javier Milei to win the presidency there.

Here’s a fantastic story from NPR about how Brazil conquered hyperinflation a number of years ago using a fake currency: https://www.npr.org/sections/money/2010/10/01/130267274/the-friday-podcast-how-four-drinking-buddies-saved-brazil

BW #62: Economic report card (solution)

Data and seven questions

Which five countries had the lowest inflation (i.e., "Inflation, average consumer prices" where the units are "Percent change") in 2023? (Is that necessarily good?) Which countries had the highest inflation in 2023?

Read next

BW #62: Economic report card

BW #63: Ukraine aid