BW #76: Aging legislators

BW #76: Aging legislators

The bombshell news over the last few days was, of course, the withdrawal of President Joe Biden from the election this November, and the swift pivot to Vice President Kamala Harris as the presumptive nominee. Biden's age had long worried voters, but his debate performance last month put a lot of people on edge, worrying whether he could win the election. At the end of an intensive weeks-long campaign, Biden withdrew from the race, and we're still figuring out just how much things have changed.

Of course, Biden isn't the only elderly politician in Washington, DC. I thought that it might be interesting to analyze detailed about members of Congress, and find out how old they are, and how old they've been over the years. Better yet, I found a data set in YAML that forced me to perform a variety of unusual transformations just to work with the data. The result? This week's questions are about the ages of US legislators, as well as wrestling odd data into submission.

Data and right questions

This week's data set comes from a GitHub repository that is remarkably detailed:

https://github.com/unitedstates/congress-legislators

We will only be looking at two of the files, legislators-historical.yaml and legislators-current.yaml, both (as the suffix indicates) in YAML format.

This week's learning goals involve working with YAML, manipulating complex data within a Pandas data frame, working with time-series data, pivot tables, and plotting.

Here are my eight questions; as usual, I'll be back on Thursday with my detailed solutions and Jupyter notebook:

  • Read the two YAML files (legislators-historical and legislators-current) into a single data frame.
  • Three columns (id, bio, and name) contain Python dicts. Expand each dict to be new columns in the row, and then remove the original columns. Then take the "terms" column, which contains a list of dicts, and expand it such that you have multiple rows per legislator, one term per row. (The rest of the data for the legislator will be duplicated.) Finally, set the "bioguide" column to be the index.