BW #54: Household debt
How much money do Americans owe, and for what? This week, we examine the NY Federal Reserve's latest report on household debt and credit.
Debt is a fixture of the modern world: We borrow money to start businesses, to buy houses (mortgages), to buy cars, and (in some countries) to get a university education. Even more convenient than applying for a loan, we can often use a credit card; so long as we pay back the balance in a short period of time, we're not in any trouble.
The New York Federal Reserve is home to the "Center for Microeconomic Data," which researches (among other things) the kinds of debt and credit that Americans hold. Last week, they came out with their latest Household Debt and Credit report, which covers through the fourth (and final) quarter of 2023.
This week, we'll look at the data that they used to create their report. Along the way, we'll get a better understanding of what kind of debt Americans have, how their debt load has changed over the years, what they owe money on, and in which states people (on average) owe the most.
Data and six questions
The NY Fed's HHDC report, including background information, is here:
https://www.newyorkfed.org/microeconomics/hhdc
You can download the specific report about household debt and credit as a PDF file:
/content/files/medialibrary/Interactives/householdcredit/data/pdf/hhdc_2023q4.pdf
The data on which the report is based, and which we'll be using this week, is in the form of an Excel spreadsheet:
/content/files/medialibrary/interactives/householdcredit/data/xls/hhd_c_report_2023q4.xlsx
A data dictionary describing the data is downloadable from:
/content/files/medialibrary/interactives/householdcredit/data/pdf/data_dictionary_hhdc.pdf
I have six tasks and questions for you this week. The learning goals include: Importing data from Excel into Pandas, cleaning data, plotting, grouping, and window functions.
I'll be back tomorrow with my detailed solutions and Jupyter notebook.
Meanwhile, here are this week's questions and tasks:
Create a dictionary of data frames from the Excel spreadsheet; the keys should be the sheet names from the Excel file, and the values should be data frames containing the data. Only keep those with the word "Data" in their names. Make sure that each data frame's column names are taken from the sheet, and that the first column is turned into the index.
From the data frame describing the chart on page 3 of the report, create a stacked bar plot replicating that chart. Use the same colors as the NY Fed's chart. Show all of the index values (on the x axis), but rotate them 60 degrees, make the chart larger, and the font size smaller, so that all are visible.
Again, from the page 3 data: How often, as a percentage, was there a quarter in which the amount of household debt declined vs. increased when compared against the previous quarter?
Now let's look at data from page 10 of the report, which shows (among other things) the total credit available to Americans on their credit cards, and how much of that is being used. Create a line plot showing these two measures ("CC Balance" and "CC Limit"), but plot one point for each year, calculating the mean on a year-by-year basis. Note that the data frame will require some cleaning; what was in the original Excel file that would cause this trouble?
Now let's consider data from page 12. The "ALL" column describes the percentage of all loans 90+ days unpaid in that quarter; the other columns break that down into loan types, and add up to that "ALL" column. Calculate the mean percentage that each loan type takes up for each year. Then find, for each year, the loan type that constituted the greatest proportion of debt. Which loan type has most often been the highest percentage of delinquent loans?
Page 32 of the report shows the total debt balance, per capita, in 10 different states, as well as the overall rate for the United States. However, our import from Excel used the first row as column names. Fix that problem, setting (or resetting) the column names to be correct and using the states as the index. Interpolate NaN values. Show which states' residents have had the greatest total cumulative increase in per-capita debt over the years.
- Create a dictionary of data frames from the Excel spreadsheet; the keys should be the sheet names from the Excel file, and the values should be data frames containing the data. Only keep those with the word "Data" in their names. Make sure that each data frame's column names are taken from the sheet, and that the first column is turned into the index.
- From the data frame describing the chart on page 3 of the report, create a stacked bar plot replicating that chart. Use the same colors as the NY Fed's chart. Show all of the index values (on the x axis), but rotate them 60 degrees, make the chart larger, and the font size smaller, so that all are visible.