The world economy is going through some weird and turbulent times. (Things sound like they'll get even more turbulent later today, when Donald Trump announces massive tariffs for anything imported into the United States.) I've heard from no small number of people that even though many companies are profitable, their nervousness, combined with advances in AI, have led companies to slow hiring, or even lay off existing employees.
That said, even in the age of sky-high promises about AI replacing programmers, people who know how to code will still be needed. And certainly right now, in early 2025, there's a need for people who know how to code.
That's why the Washington Post last month (https://wapo.st/42shmfm) ran an article whose headline ("More than a quarter of computer-programming jobs just vanished. What happened?") indicated some surprise and skepticism regarding the decline in programming jobs. Can it be that these jobs have really gone away? (Spoiler: No, not really.)
The article certainly encouraged me to look into this issue, and to see if we could analyze the data ourselves, to find out what has happened to programmer jobs. This week, we'll look at coding-related jobs in the US – how many people work in them, where they work, and how much that has changed in the last decade.
Data and five questions
This week's data comes from the Bureau of Labor Statistics at the Department of Commerce, specifically their data on occupational employment and wage statistics (https://www.bls.gov/oes/tables.htm). They publish data about once each decade, although from what I can tell, more recent data is available from the Census Bureau.
We'll mostly look at data from 2023, which you can download by clicking on "all data" from the "May 2023" section of the OES page, or by using this link:
https://www.bls.gov/oes/special-requests/oesm23all.zip
We'll also spend a bit of time looking at data from 2013, and comparing it with the 2023 data, which you can similarly download by clicking on the link next to "all data" in the 2013 section, or by clicking here:
https://www.bls.gov/oes/special-requests/oesm13all.zip
This week's learning goals include: Working with Excel, regular expressions, optimizing speed, using PyArrow, grouping, joining, and plotting.
As always, paid subscribers can download the data file from a link at the bottom of this message.
Here are this week's five questions and tasks:
- Read in the data from Excel. We only need some of the columns (AREA_TITLE, AREA_TYPE, OCC_TITLE, I_GROUP, TOT_EMP, JOBS_1000, O_GROUP, PCT_TOTAL, and A_MEAN. Treat "*", "**", and "#" as a
NaN
value. Does it take less time to read the data if we limit the columns? How much does it change the size of the data (in memory) by limiting the columns? - Store the data in Feather and Parquet format, and read from these files back into Pandas. Does it take less time to read from these formats? Which is faster?