BW #105: Federal employees
Get better at: String manipulation, working with multiple files, joins, grouping, and plotting
data:image/s3,"s3://crabby-images/cc15c/cc15c939a16d6d4467a8ca4ed1383be993b3d84d" alt="BW #105: Federal employees"
Administrative note: With this issue, we begin year 3 of Bamboo Weekly! Thanks to all of you who have subscribed, and especially to all of you who are supporting my work with a paid subscription. I'm constantly trying to make Bamboo Weekly a better tool for improving your skills with data analysis and Pandas.
Among other things: I'm working to tag all back issues of BW, so that you can more easily find questions to improve particular Pandas skills. Paid subscribers also get a link that loads my Jupyter notebook into Google Colab for easier experimenting and learning.
If you're enjoying (and learning from) Bamboo Weekly, please tell your friends and colleagues about it.
The US government is huge. But it has to be, because it is involved in so many things -- including many that we don't know or think about. I know that I've mentioned it before, but "The Fifth Risk," by Michael Lewis, makes it clear that before you complain about the US government, you should learn about how its experts have saved and improved countless lives. And also how they think about fixing long-term problems that most of us don't know about, and that no for-profit company will tackle.
Of course, not everyone subscribes to this theory. Elon Musk has been tasked with cutting the size of government, and he has started to do it, despite federal judges saying that he and his team are breaking multiple laws, and need to justify their actions in court before continuing.
But the chaos has already begun. I'm not particularly connected to the US government, and I have seen numerous posts from friends and colleagues on social media describing their research as no longer funded. The New York Times (https://www.nytimes.com/interactive/2025/02/11/us/politics/trump-musk-doge-federal-workers.html?unlocked_article_code=1.wU4.JA39.2G6BjL6fH5Rm&smid=url-share) described which federal workers have already been cut.
This week, we'll thus look at data about the federal workforce, using data from the Office of Personnel Management (OPM), which is in charge of federal employees. We'll look at where they live, what sorts of work they do, and how much money they earn.
Of course, the real purpose of Bamboo Weekly is to help you improve your skills with data analysis. And this week's data-analysis learning goals include:
- Working with multiple files (https://www.bambooweekly.com/tag/multiple-files/)
- Grouping (https://www.bambooweekly.com/tag/grouping/)
- Joins (https://www.bambooweekly.com/tag/joins/)
- Plotting (https://www.bambooweekly.com/tag/plotting/)
- String manipulation (https://www.bambooweekly.com/tag/strings/)
A link to the data file is at the bottom of this post. I'll be back tomorrow with my complete solutions, including (for paid subscribers) the Jupyter notebook I used to solve the problems, in both downloadable format and in a one-click format that opens in Google Colab.
Data and six questions
The data comes from their "Fedscope" project, published on a regular basis to help the public understand the number, location, salaries, and jobs of federal employees. You can get the data from the OPM site:
https://www.opm.gov/data/datasets/
The most recent data is from March 2024, and was posted in October of last year. Click on the icon under "downloads" to get a zipfile, or click here:
https://www.opm.gov/data/datasets/Files/716/3679c569-8492-46db-934a-eb3882647abb.zip
This zipfile includes a number of data files, all with the TXT
suffix. There is also a PDF data dictionary that goes through each of the fields in the individual files.
- Create a data frame describing all federal employees from the main
FACTDATA_MAR2024.TXT
file. For each of the columns 'LOC', 'AGELVL', 'EDLVL', 'LOSLVL', 'OCC', 'SALLVL', and 'STEMOCC', load the corresponding data file (i.e., DTname.txt) into a data frame, and combine it with the main data frame. Do the same forDTagy.txt
, which should be combined using theAGYSUB
column in the main file. - Find the 20 top-level agencies with the greatest number of employees. Print their names ("agency translation") and the number of people who work there, with commas before every three digits.